What's the plan for ISO 639-3 and RFC 3066 ter?

John Cowan jcowan at reutershealth.com
Mon Aug 16 19:18:37 CEST 2004


Addison Phillips [wM] scripsit:

> The question is: does ISO 639-3 supersede ISO 639-2 as the source for
> three letter codes? Or not?

Mu.

> If 639-3 is a strict superset, then the additional three letter codes
> could just be admitted as language subtags. In fact, I'm given to
> understand from Peter's prior explanations that this should be the
> goal for most of the 639-3 codes.

The situation as I understand it will be as follows:

1) The ISO 639 standard will draw on a single pool of three-letter codes.
No three-letter code will be used for more than one purpose.

2) ISO 639-3 will assign codes to individual languages and
macro-languages.  These codes will be identical to the existing 639-2
codes where they exist; where there is no existing 639-2 code, they will
be identical to Ethnologue 14th edition codes where possible.

3) ISO 639-5 will assign codes to language collections.  These codes
will be identical to the existing 639-2 codes where they exist.

4) ISO 639-3 and ISO 639-5 codes will be disjoint.

5) ISO 639-2 will specify a subset of (the union of ISO 639-3 and ISO 639-5
codes) that specify languages which meet the restrictions of ISO 639-2
(basically, that there are at least fifty documents in the language,
held by at most five organizations).

6) ISO 639-1 will continue to specify a subset of ISO 639-2, and will
assign two-letter codes to its members.  Except for a transitional period
after the promulgation of ISO 639-3 and ISO 639-5, it will effectively
become a closed collection.

(ISO 639-4 will explain all this, and will not define any codes.)

> The need for extlang subtags would then be muted (and might even be
> eliminated). Only language codes that had "macro languages" associated
> with them could be registered as extlangs. In fact, these subtags
> might be cherry picked on an as-needed basis (rather than having a
> full-fledged formal source).

ISO 639-3 will provide a mapping between macro-languages and the
individual languages that are parts of them.  I don't know (and it may
not have been decided) whether ISO 639-5 will provide a mapping between
collective codes and the languages covered by them.

> Canonicalizing and matching the tags in this situation would be much
> more complicated:
> 
> zh-min-nan // ignore the min problem for a second
> zh-nan
> nan

This (and its twin zh-min-bei) are the most complex cases.  The vast
majority of all macro-languages do not contain other macro-languages
(as zh contains min).  Indeed, it is doubtful whether ISO 639-3
will provide nested macro-languages at all.

Let us consider more straightforward cases.  I am assuming throughout
that Peter's recommendations for changes to 639-2 are accepted.

A) Currently, the macro-language "Occitan" is encoded as oc.  There
will be four individual languages corresponding to this macro-language
in 639-3:  Auvergnat (auv), Gascon (gsc), Languedocien (lnc), and
Limousin (lms).

B) Currently, the collective "Land Dayak languages" is encoded as day.
(It's not marked as a collective in ISO 639-2, but Peter has proposed
that it be changed to a collective).  There are 16 languages in this
collective.

In each case, we have three possibilities:

1) Allow the individual language codes;

2) Allow the higher-order code extended by an individual language code;

3) Allow both 1 and 2 as synonyms.

Accepting 1 means that systems which consume resources labeled oc must
now also be prepared to consume resources labeled aug, gsc, lnc, and lms.
Accepting 2 allows normal fallback behavior to work:  oc-aug will be
recognized as oc automatically.  Accepting 3 means that some normalization
scheme must be provided.  All three possibilities have drawbacks.

For case B, ISO 639-5 must provide mapping tables (per above) in order
to make conversion between collective and individual language codes
practicable.  If this is not done, only possibility 2 will fly.

> John (and others on the list), are you happy with 3066
> bis? Indications of support or opposition (with reasons) would be
> useful at this juncture in finishing this work.

I support it in principle.  I haven't had a chance to check the current
draft, but I suspect that any nits I would find, others will find too.
Still, I'll try to squeeze in another pass.

Note:  Before leaving on vacation, Peter left us all a present:
an editor's draft of ISO 639-3, reachable from the last link on
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=PCUnicodeDocs
The draft is 142 pages long, but all except the first 14 and last 6
pages of the document are just the code to language mapping tables,
in code order and in language order.

-- 
Her he asked if O'Hare Doctor tidings sent from far     John Cowan
coast and she with grameful sigh him answered that      www.ccil.org/~cowan
O'Hare Doctor in heaven was. Sad was the man that word  www.reutershealth.com
to hear that him so heavied in bowels ruthful. All      jcowan at reutershealth.com
she there told him, ruing death for friend so young,
algate sore unwilling God's rightwiseness to withsay.   Ulysses, "Oxen"


More information about the Ietf-languages mailing list