Problems deciding if az- should have multiple registrations...

Peter_Constable at Peter_Constable at
Mon Apr 14 20:24:30 CEST 2003

Addison Phillips wrote on 04/14/2003 05:40:15 PM:

> Actually, I note that IE makes a script distinction ["Azeri (Latin)" and
> "Azeri (Cyrillic)"] in its languages dialog, but these both result in 
> RFC3066 tag "az". The GUI implies a distinction that cannot be 
> in a standard tag.
> My concern about this hypothetical is that it implies that if I *do* 
find a
> system that makes the arbitrarily deeper (generative) distinction, I 
> be able to set the behavior on that system correctly. You can already 
> this with the IE example.

But in that IE example, there's obvious motivation for wanting to support 
the distinction; not so with the hypothetical az versus az-AZ.

> I also know for a fact that many more common locales are fully 
generated. My
> Solaris 2.8 system has:
> th and th_TH
> ko and ko_KR
> ja and ja_JP
> it and it_IT
> sv and sv_SE

But what difference can th versus th_TH possibly correspond to?

*If* we want RFC3066(bis) tags (th-TH, etc.) to be provided that can be 
mapped to this legacy locale cruft, perhaps we should explicitly register 
a complete list of these things, documenting exactly why they exist, and, 
whenever it's the case that no system makes any significant difference 
between the locales that specify a country and those that don't, that 
these are equivalent# to the shorter forms.

#Harald raised the issue of a counterpart to Unicode's canonical 
equivalence / normalisation. I wonder if we might even need something like 
the distinction between canonical and compatibility equivalence. 

> This is addition to the usual compement of en, es, and fr locales. But 
> latter have actual language or locale differences to distinguish them. 
> the foregoing, only Korean is really a candidate for indeterminant 
> But there is a very real distinction between (for example) the sv and 
> locales or the ja and ja_JP locales. Not the least of which is: if I 
want a
> Unicode (UTF-8) locale, I need the long form and not the short one in 
> cases, at least on my machine.

You want us to have RFC3066(bis) tags to distinguish character set 
encodings? I'm somewhat open to considering your suggestion that we allow 
"language" tags to support the distinctions in legacy locales so as to be 
able to deal with the legacy encoding problems and to move on to better 
solutions. But when the distinction is one of encoding, I think that's 
going too far. In HTML and XML, there are already mechanisms distinct from 
those that use RFC3066 tags to handle charset/encoding distinctions. 
(Indeed, between HTML and HTTP, there are multiple mechanisms for handling 
charset/encoding distinctions.) I really think it would be a bad idea to 
get RFC3066(bis) mixed up in anyway with character set encoding issues.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

> >
> > > 1. *Are* we in agreement that RFC3066bis needs writing?
> >
> > My vote is yes.
> Me too, but I'm waiting for the Official Poll from Harald before wading 
> deeper into the discussion.
> Best Regards,
> Addison
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> +1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
> -------------------------------------------------
> Internationalization is an architecture.
> It is not a feature.
> Chair, W3C-I18N-WG Web Services Task Force
> To participate see
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at

More information about the Ietf-languages mailing list