What's the plan for ISO 639-3 and RFC 3066 ter?

Doug Ewell dewell at adelphia.net
Sat Aug 21 21:24:46 CEST 2004


Peter Constable <petercon at microsoft dot com> wrote:

>> So if ISO 639-3 wanted to assign the ISO 639-3 code "min" to the Min
>> language group in China, there would be no conflict with Minangkabau,
>
> Impossible. It will be impossible for alpha-3 identifiers to mean
> different things in different parts of ISO 639.

Thank you for clarifying this aspect of ISO 639-3.  The draft document
was very useful as well in helping me understand where this new part
fits with the rest of ISO 639, and how extended language subtags in RFC
3066bis and beyond are envisioned.  I hadn't seen any of this before.

Let's see how well I understand this:

In the broadest, most outrageously simplified terms, ISO 639-3
represents a superset of ISO 639-2.  All codes in 639-3 that also exist
in 639-2 mean the same thing.  The main difference is that 639-3 draws a
formal distinction between individual, macrolanguage, and collection
codes in 639-2.  It includes the first, includes and clarifies the use
of the second, and excludes the third.

Currently, no macrolanguage is itself included within another
macrolanguage.  That is, the hierarchy of macrolanguages contains only
two levels.

All macrolanguages defined in 639-3 are already encoded in 639-2.

This means for RFC 3066ter purposes, only the following scenarios
involving ISO 639-3 are likely to be useful:

ciw -- primary language subtag for a language not included in 639-2
(Chippewa)

oj-ciw -- combination of primary language subtag for macrolanguage
(Ojibwa) and extended language subtag for specific member of macro
family (Chippewa)

sgn-psd -- combination of existing primary language subtag (Sign
Languages) and extended language subtag (Plains Indian Sign Language),
where there is no "macro" relationship defined in 639-3

Now, the grandfathered tag zh-min-nan (Minnan, Hokkien, Amoy, etc.) is
frequently cited, both in RFC 3066bis and in discussions, as an example
of multiple-level extended language subtags.  But as it turns out, this
language could not be coded in this way using subtags derived from ISO
639-3, because min is allocated to Minangkabau in 639-2 (and thus also
in 639-3), not to the Min sub-family of Chinese languages.  In fact,
there is no macrolanguage code at all for the Min languages as a group;
they are coded separately under the macrolanguage Chinese (zho, or zh in
RFC 3066bis).  So zh-min-nan could not be shoehorned into the extended
language mechanism based on 639-3 in any event.

The only two ways Minnan could be coded in RFC 3066ter -- other than by
using the grandfathered tag zh-min-nan -- are:

nan
zh-nan

The three-level extended-language hierarchy established by this one
grandfathered tag does not exist in any other tag; the DE-*-1901 and
DE-*-1996 tags for German and the three sgn-*-* tags do not represent
the same "extended language" concept.

Based on all this, I was all ready to propose that RFC 3066bis impose a
limit of one extended-language subtag, akin to the limit of one script
subtag and one region subtag.  This would simplify the model somewhat.
However, Peter subsequently wrote:

> My hunch that, through long-term use and maintenance, we'll eventually
> end up with at least some nested macrolanguages, but nothing is fixed
> in stone at this time (and have no plan to specify any sanction or
> restrictions related to this in the text of 639-3).

So it is theoretically possible that a lang-extlang-extlang tag could be
needed.  But this is now based on a "hunch," rather than a concrete
example, which zh-min-nan was represented as being.

One of the goals of RFC 3066bis is to reduce the need for constant
revisions.  However, it will have to be revised anyway after ISO 639-3
is published, to define the terms of support for extended language
subtags derived from 639-3.  RFC 3066bis could still impose a limit of
one extlang subtag now, and this decision could be revisited when it
comes time to create RFC 3066ter, based on whether Peter's hunch seems
more or less likely at that time.

This might prevent RFC 3066bis from being burdened with a construct that
may prove never to be needed.  The grammar could always be extended in
the future to allow multiple extlang subtags, just as 3066bis extends
the 3066 grammar, while it would be almost impossible to go the other
way and restrict the grammar, invalidating something that was once
valid.  Implementations would probably have an easier time generating
and validating a single extlang subtag than an arbitrary number of them.

One possible drawback is the need to revise the grammar at all.  The
future publication of ISO 639-3 and consequent creation of extlang
subtags doesn't currently require a change to the grammar.
Additionally, implementations that are built to allow only a single
extlang subtag might need to be revised if more than one is allowed in
the future.

Comments?

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/




More information about the Ietf-languages mailing list