Problems deciding if az- should have multiple registrations...
Addison Phillips [wM]
aphillips at webmethods.com
Fri Apr 11 11:46:33 CEST 2003
Michael Everson wrote:
>> Is there a distinction in orthography between each pair of the following?
> Your answer is "No", then.
Mark's example, though, seems to indicate that the existing regime has
not made a hard-and-fast distinction either. The orthographic
distinction as justification for not registering seems untenable. The
real distinction appears to be whether the code would be worth
registering as a special case because there is demand for using it as a
>> If we can only
>> get az-Cryl and az-Latn registered, or if the end goal for 3066bis
>> will not
>> permit both #5 and #6, then we would probably be forced to define our
>> language codes as "based on" RFC 3066, but not identical.
> Or you could alter your software and make it work properly with language
> codes and locales.
But that's the point, isn't it? It isn't ICU that is being dealt with
here, but the underlying system that ICU (or my software, for that
matter) is running on. ICU could be modified, but if it can't
interoperate then there will be problems.
Locale identifiers are hobbled by a long term confusion with language
tags. Fixing locales requires either parallel changes to language tags
If you examine the case for divergence (which is a case I've made
forcefully for the past year or so, so I've spent a lot of time thinking
about it), you eventually end up with problems related to the fact that
the language tag is necessarily part of the locale--and it conflicts
with portions of the locale ID designed to solve this same problem. Long
discussions with Mark and others have led me to the conclusion that the
simpler and more satisfying conclusion is to treat language as the
locale identifier and all the other things as preferences.... and fix
the language tags themselves (to deal with an obvious glaring omission)
rather than try to circle around the problem in the locale tags.
You might not have the same conclusion.
In any event, if language==locale ID, we really should fix the edge
cases of language tagging. There appears to be no resistence to the one
case I actually care about today (zh-*), but I find the problems with
the parallel example of az-* disturbing.
I imagine that there are systems with locales that look like:
az.ISO8859_1 at latin
az-AZ.ISO8859_1 at latin
These are not different on some level recognized as linguistic, but the
data files for these locales are actually not the same and may actually
*be* different in some recognizably linguistic manner.
Japanese has similar problems. There are many systems that have both
'ja' and 'ja_JP' locales. These are not lingistically different unless
you follow Martin's argument that number formats and the like are
language or orthographic differences.
Nonetheless, if we are all in agreement that a generative RFC3066bis
should be created, registering these temporary markers seems either a)
irrelevent [possibly a waste of time if the standards process can go
fast enough] or b) forward-looking depending on your viewpoint.
So I guess:
1. *Are* we in agreement that RFC3066bis needs writing?
2. Why not register things that will become sanctioned in #1?
> These codes are to encode language distinctions, and are NOT intended to
> become the catch-all for locale identification. The point is, locale
> specification has not been done correctly, and it should not be on the
> entity-coders to fix it. It should be on the people who have botched
> their locale identification structures.
Only if the locale specification doesn't rely on the entities. If the
case is that locales and RFC3066's use of ISO639 and ISO3166 as
Ur-standards is just happenstance, then you are correct. It is my belief
(and I believe Mark's) that the similarity is not actually accidental.
That is, fixing language tags and then defining them as the Ur-standard
for locale identifiers solves a lot of long standing problems and hurts
almost no one.
If Serbian, Uzbek, and Azeri form the complete list of languages that
require some additional registration, then I think we could register
these, given some demonstration of need, and move along. Obviously the
fudge in that sentence is "some". Mark has "some" justification. You
would like "more". Given that no one is likely to research tiny
orthographic differences, the justification proposed is that some form
of unknown-but-real legacy (computer) differentiation is still a
The counter argument appears to be "the computer distinction does not
mark a real human-language distinction". The long list of English codes
suggests that this argument is actually empty: a country *could*
legislate something, but none appear to have done so to the extent that
a separate code need be summarily registered *in advance* of the
difference appearing. Or am I reading this wrong?
Addison P. Phillips
Director, Globalization Architecture
+1 408.962.5487 mailto:aphillips at webmethods.com
Internationalization is an architecture. It is not a feature.
Chair, W3C I18N WG Web Services Task Force
More information about the Ietf-languages