RFC3066bis: looking ahead

Mark Davis mark.davis at jtcsv.com
Tue Jan 20 03:10:26 CET 2004

No, it would be an issue. 3066bis is designed so that one can parse the tags and
decide without lookup which are language, which are script, and which are region
codes. Adding possibly dual language codes would break this. Have to think about
it a bit.

► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- 
From: "Peter Constable" <petercon at microsoft.com>
To: <ietf-languages at alvestrand.no>
Sent: Mon, 2004 Jan 19 16:03
Subject: RFC3066bis: looking ahead

A ballot on ISO 639-3 closed last month, and this project is moving forward,
with a new draft expected to be circulated as a DIS very soon.

One of the innovations in ISO 639-3 is to recognize a third scope for
identifiers, in addition to the individual-language and collective scopes
included in ISO 639-2. This third scope is between the other two, and is being
called "macrolanguage". The idea of a macrolanguage identifier is that the thing
it represents is considered an individual language in some usage context, though
it encompasses two or more individual languages listed in ISO 639-3.

A good example of this is Chinese: it has "individual-language" identifiers in
ISO 639-1 and ISO 639-2 (zh, zho), though we know that there are distinct,
individual languages within its scope, such as Yue, Hokkien, etc. Thus, when ISO
639-3 is published, there will be alpha-2 and alpha-3 identifiers for "Chinese",
but there will also be alpha-3 identifiers for the various Chinese languages
such as Yue and Hokkien.

I started thinking about how to integrate ISO 639-3 into the RFC 3066 framework
once the former has been published. Ignoring for the moment the existence of
registered tags such as zh-yue, it would become possible to use a three-letter
identifier for Yue (let's say it's "yue" for sake of discussion), but there will
also be prior implementations that use "zh", and thus a need to relate "zh" and

It occurred to me that an easy way to do this would be to require a hierarchical
tagging, "zh-yue" for these situations (i.e. only where a macro-language
identifier exists). This will make use of the existing language-range mechanism;
so, for instance, an HTTP request for "zh" will match on content tagged with
"zh-yue" (which wouldn't happen if the content were tagged as "yue").

This has potential implication for the syntax being proposed in RFC 3066bis,
which allows sub-tags for language, then script, then region, then variant.
Something like "zh-yue" would involve another sub-tag between the first one, for
language, and a subsequent one for script, region or variant; this extra sub-tag
would also be for language. I don't think this is a serious problem, however: if
RFC 3066bis were published with syntax as in the current draft, it would simply
be a matter of revising the expansion given for "lang" so as to allow terminals
of the form 2*3 ALPHA "-" 3 ALPHA. As long as we don't end up using alpha-3
country IDs, this should work without problems.


Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division

Ietf-languages mailing list
Ietf-languages at alvestrand.no

More information about the Ietf-languages mailing list