Problems deciding if az- should have multiple registrations...

Fri Apr 11 20:18:36 CEST 2003

At 10:46 -0700 2003-04-11, Addison Phillips [wM] wrote:
>Michael Everson wrote:
>>>
>>>Is there a distinction in orthography between each pair of the following?
>>
>>Your answer is "No", then.
>
>Mark's example, though, seems to indicate that the existing regime 
>has not made a hard-and-fast distinction either. The orthographic 
>distinction as justification for not registering seems untenable.

What?

I asked whether "az-Latn" and "az-Latn-AZ" differed in any way. If 
they do not, then the codes are duplicates.

>The real distinction appears to be whether the code would be worth 
>registering as a special case because there is demand for using it 
>as a separate identifier.

No. Because there will never be an end to that, and this is voluntary work.

>But that's the point, isn't it? It isn't ICU that is being dealt 
>with here, but the underlying system that ICU (or my software, for 
>that matter) is running on. ICU could be modified, but if it can't 
>interoperate then there will be problems.
>
>Locale identifiers are hobbled by a long term confusion with 
>language tags. Fixing locales requires either parallel changes to 
>language tags or divergence.

Language tags are language tags, not locale tags. If the computer 
industry or some players in it have gunked-up software because 
programmers made erroneous assumptions about the structure of 
"locale" with regard to its elements, it is encumbant on the industry 
or those players to structure their software more accurately with 
regard to good localization and internationalization practice.

Remember way back when everyone still hard-coded user-visible text strings?

>If you examine the case for divergence (which is a case I've made 
>forcefully for the past year or so, so I've spent a lot of time 
>thinking about it), you eventually end up with problems related to 
>the fact that the language tag is necessarily part of the 
>locale--and it conflicts with portions of the locale ID designed to 
>solve this same problem.

Language tags are there to tag languages. They are not there to solve 
everyone's locale problems.

>Long discussions with Mark and others have led me to the conclusion 
>that the simpler and more satisfying conclusion is to treat language 
>as the locale identifier and all the other things as preferences.... 
>and fix the language tags themselves (to deal with an obvious 
>glaring omission) rather than try to circle around the problem in 
>the locale tags.
>
>You might not have the same conclusion.

I do not.

>In any event, if language==locale ID, we really should fix the edge 
>cases of language tagging. There appears to be no resistence to the 
>one case I actually care about today (zh-*), but I find the problems 
>with the parallel example of az-* disturbing.

The zh-* examples MAY differentiate different entities, and the 
proposed az-* examples have not been shown to do so.

>I imagine that there are systems with locales that look like:
>
>az.ISO8859_1 at latin
>az-AZ.ISO8859_1 at latin

Ghastly.

>These are not different on some level recognized as linguistic, but 
>the data files for these locales are actually not the same and may 
>actually *be* different in some recognizably linguistic manner.

May it, indeed?

>Japanese has similar problems. There are many systems that have both 
>'ja' and 'ja_JP' locales. These are not lingistically different 
>unless you follow Martin's argument that number formats and the like 
>are language or orthographic differences.

639 and SIL and 3066 specify language tags, Addison, not locales.

>Nonetheless, if we are all in agreement that a generative RFC3066bis 
>should be created, registering these temporary markers seems either 
>a) irrelevent [possibly a waste of time if the standards process can 
>go fast enough] or b) forward-looking depending on your viewpoint.
>
>So I guess:
>
>1. *Are* we in agreement that RFC3066bis needs writing?

In order to permit a greater flexibility in tagging LANGUAGES, yes. 
In order to extend it to solve the woes of misbegotten 
locale-identification systems, no.

>2. Why not register things that will become sanctioned in #1?

More haste less speed?

>Only if the locale specification doesn't rely on the entities. If 
>the case is that locales and RFC3066's use of ISO639 and ISO3166 as 
>Ur-standards is just happenstance, then you are correct. It is my 
>belief (and I believe Mark's) that the similarity is not actually 
>accidental.

I think that is, if you will forgive me, sloppy reasoning. The 
reality is more subtle and complex than that.

>That is, fixing language tags and then defining them as the 
>Ur-standard for locale identifiers solves a lot of long standing 
>problems and hurts almost no one.

I do not believe that a language-tag = locale. Many users are 
multilingual. Many users use languages in places where other 
languages are spoken in the majority.

Locale distinctions are a different thing from language. They may be 
related, but, as I said, 639 and SIL and 3066 specify codes to 
identify languages, not locales.

>If Serbian, Uzbek, and Azeri form the complete list of languages 
>that require some additional registration, then I think we could 
>register these, given some demonstration of need, and move along. 
>Obviously the fudge in that sentence is "some". Mark has "some" 
>justification. You would like "more". Given that no one is likely to 
>research tiny orthographic differences, the justification proposed 
>is that some form of unknown-but-real legacy (computer) 
>differentiation is still a difference.

I know what a language is and what a locale is, and I'm here to judge 
the registration of codes for languages.

>The counter argument appears to be "the computer distinction does 
>not mark a real human-language distinction". The long list of 
>English codes suggests that this argument is actually empty: a 
>country *could* legislate something, but none appear to have done so 
>to the extent that a separate code need be summarily registered *in 
>advance* of the difference appearing. Or am I reading this wrong?

I don't understand your "counter argument".
-- 
Michael Everson * * Everson Typography *  * http://www.evertype.com