Problems deciding if az- should have multiple registrations...

Addison Phillips [wM] aphillips at
Mon Apr 14 18:50:03 CEST 2003

> > I also know for a fact that many more common locales are fully
> generated. My
> > Solaris 2.8 system has:
> >
> > th and th_TH
> > ko and ko_KR
> > ja and ja_JP
> > it and it_IT
> > sv and sv_SE
> But what difference can th versus th_TH possibly correspond to?

None that I know of.

> *If* we want RFC3066(bis) tags (th-TH, etc.) to be provided that can be
> mapped to this legacy locale cruft, perhaps we should explicitly register
> a complete list of these things, documenting exactly why they exist, and,
> whenever it's the case that no system makes any significant difference
> between the locales that specify a country and those that don't, that
> these are equivalent# to the shorter forms.

The problem is that existing generative rules allow for th-TH, I believe. My
concern is that the script seems more appropriate when affixed to the ISO639
tag, rather than variant style off the end. Actually, I find the bizarre mix
of locales in the POSIX system really annoying and wish UNIX vendors would
clean things up a little bit. Things aren't so bad in (say) Java.

> #Harald raised the issue of a counterpart to Unicode's canonical
> equivalence / normalisation. I wonder if we might even need
> something like
> the distinction between canonical and compatibility equivalence.

That seems like a good idea. Mapping rules seem like a way to do this to me.

> Not the least of which is: if I
> want a
> > Unicode (UTF-8) locale, I need the long form and not the short one in
> both
> > cases, at least on my machine.
> You want us to have RFC3066(bis) tags to distinguish character set
> encodings?

Oh heavens no! By no means is that what I meant.

The point is that calling setlocale with "ja.UTF8" doesn't work. I can
generate the appropriate Unicode-encoded locale on some systems only by
filling out the region tag. That's my point. Any not a hypothetical one. I
have a small C program that does collation and it tries hard to convert the
HTTP requested language into a UTF-8 locale for LC_COLLATE. This works more
reliably if I have a full language tag. Again, this is possibly a mapping

Best Regards,


More information about the Ietf-languages mailing list