Harald Tveit Alvestrand harald at
Fri Apr 11 19:23:11 CEST 2003

--On fredag, april 11, 2003 09:03:54 -0700 Mark Davis 
<mark.davis at> wrote:

> I can understand your concern. Well, among languages we need to make at
> least the distinctions that Windows and others make; we have to be able to
> interwork with major platforms.

so one of your goals is round-trip loss-free translation between Microsoft 
Windows' language tagging system and the ICU language tagging system. Right?
What are the "others" you mention?

(hmmm.... I see the equivalent of Unicode's compatibility characters and a 
need for language tag normalization down that road.... I don't like it, but 
can see why we could need it....)

> If the end goal is to extend 3066bis to
> permit the equivalent of:
> 5. <iso_639_code> "-" <iso_15924_code>
> 6. <iso_639_code> "-" <iso_15924_code> "-" <iso_3166_code>
> then it does no harm to have the additional registrations. If we can only
> get az-Cryl and az-Latn registered, or if the end goal for 3066bis will
> not permit both #5 and #6, then we would probably be forced to define our
> language codes as "based on" RFC 3066, but not identical.

Thanks - what I'm trying to understand is the shape of the forcing function.

> The registrations proposed are only the tip of the iceberg; eventually we
> could need up to something like the following list (where * means each of
> the various scripts used with the spoken language):
> for zh-*: HK, MO, CN, SG, TW, US,...
> for az-*: AZ, IR, ...
> for uz-*: AF, KZ, KG, TJ, TM, UZ, ...
> for sr-*: YU, BA, MK, HR, ...
> which is why a generative mechanism is much simpler.

it's definitely simpler for the producer of the codes. It may or may not be 
simpler for their consumer; Unicode's ability to preserve round-trip 
translations was one thing that engendered the concept of "unnormalized" 
Unicode text; that certainly did not make life simpler for the consumers of 


More information about the Ietf-languages mailing list