Problems deciding if az- should have multiple registrations...
Addison Phillips [wM]
aphillips at webmethods.com
Fri Apr 11 13:27:59 CEST 2003
Well, I'm probably going to regret replying to this.
>
> I asked whether "az-Latn" and "az-Latn-AZ" differed in any way. If they
> do not, then the codes are duplicates.
Okay, but the argument has been presented that 3166 coded values are
registered to distinguish national or official languages. So:
1. Is 'az' the official or national language of 'AZ'? If yes then
register 'az-AZ'...
2. Is there a distinction between 'az-latn' and 'az-cyrl'? If yes, then
register 'az-latn' and 'az-cyrl'.
3. See #1. Repeat as needed.
>>
>> Locale identifiers are hobbled by a long term confusion with language
>> tags. Fixing locales requires either parallel changes to language tags
>> or divergence.
>
> Language tags are language tags, not locale tags. If the computer
> industry or some players in it have gunked-up software because
> programmers made erroneous assumptions about the structure of "locale"
> with regard to its elements, it is encumbant on the industry or those
> players to structure their software more accurately with regard to good
> localization and internationalization practice.
I'm not arguing that point. Only that fixing locales relies at least to
some extent on fixing language identifiers. I need a way to distinguish
'zh-Hant' from 'zh-Hans'. The best way to do that would be to get an
Official Language Tag because it is the language tag 'zh' that is
gunking things up, if you will.
>
>> If you examine the case for divergence (which is a case I've made
>> forcefully for the past year or so, so I've spent a lot of time
>> thinking about it), you eventually end up with problems related to the
>> fact that the language tag is necessarily part of the locale--and it
>> conflicts with portions of the locale ID designed to solve this same
>> problem.
>
> Language tags are there to tag languages. They are not there to solve
> everyone's locale problems.
But one thing language tags tag is software resources contained in
locales. Tagging those resources is a valid use, even by your
definition. I'm trying to solve locale interoperability problems and my
point is that introducing a field in the locale structure called
"script" is solving a problem that language tags really ought to, hence
this discussion. We can solve the problem separately or together.
Together looks like a better choice, given that this forum is already
working on it and changes in language tags will affect locale
identifiers anyway.
>> I imagine that there are systems with locales that look like:
>>
>> az.ISO8859_1 at latin
>> az-AZ.ISO8859_1 at latin
>
> Ghastly.
Exactly so. My problem is how to run my Java program over the top of
that mess and still get system messages in the right language, script,
and orthography.
>
>> These are not different on some level recognized as linguistic, but
>> the data files for these locales are actually not the same and may
>> actually *be* different in some recognizably linguistic manner.
>
> May it, indeed?
Sure, why not? We're both speculating here! I didn't go and compare all
the files either, but I'm pretty sure I know of at least two fields in
the above speculated locales (yes, I know, I know...) that are different.
>
>> Japanese has similar problems. There are many systems that have both
>> 'ja' and 'ja_JP' locales. These are not lingistically different unless
>> you follow Martin's argument that number formats and the like are
>> language or orthographic differences.
>
>
> 639 and SIL and 3066 specify language tags, Addison, not locales.
No kidding...? didn't I say that a few messages back (and in fact
propose language to put into a 3066bis to deal with that)?
I think that is the point I'm making: locales often use one or the other
form interchangeably, even when there is no reason to. Hence, there is
likely to be data (software resources in this case, but also content,
etc.) tagged with the 'ja-JP' form.
In fact, I *know* that there is (plain ol' textual) XML and HTML content
tagged as ja-JP, because I've seen it. It's pointless to tag it that
way, but there it is!
Your argument is essentially that text tagged 'az-latn' is different
than 'az-AZ' somehow. We shouldn't modify the tag 'az-AZ' to identify
the real differences in language, which may be better conveyed by the
latin or cyrillic identifiers.
Okay, language tags not only aren't locale tags, but they must never
touch those icky locale things, even obliquely... but I suspect that
this distinction is lost on average users trying to get the content they
want via Accept-Language.
>> So I guess:
>>
>> 1. *Are* we in agreement that RFC3066bis needs writing?
>
> In order to permit a greater flexibility in tagging LANGUAGES, yes. In
> order to extend it to solve the woes of misbegotten
> locale-identification systems, no.
Okay, that's what I want. But it is worth noting that fixing language
tags helps those misbegotten systems too. Knock-on benefits are good.
>
>> Only if the locale specification doesn't rely on the entities. If the
>> case is that locales and RFC3066's use of ISO639 and ISO3166 as
>> Ur-standards is just happenstance, then you are correct. It is my
>> belief (and I believe Mark's) that the similarity is not actually
>> accidental.
>
> I think that is, if you will forgive me, sloppy reasoning. The reality
> is more subtle and complex than that.
How long do you want the email to be?
I don't see this as sloppy. The fact is that 3066 tags (based on
639/3166) and locale identifiers (often based on 639/3166) are both
similarly constituted and that originally the 693/3166 portion of most
kinds of locale identifier was supposed to identify the 'natural
language' portion of the locale. This is not an accident. It is by design.
I can't speak to the originators of RFC1766. Probably they will come out
of the woodwork to inform me that they intentionally chose the same pair
of Ur standards but for different, incompatible reasons. But I'm not
sure what difference that makes: both systems are trying to identify
language preferences on some level. It makes sense to compromise on a
system that satisfies as many as possible. Having locales use a
different language identifier seems sloppy to me.
>
>> That is, fixing language tags and then defining them as the
>> Ur-standard for locale identifiers solves a lot of long standing
>> problems and hurts almost no one.
>
>
> I do not believe that a language-tag = locale. Many users are
> multilingual. Many users use languages in places where other languages
> are spoken in the majority.
Hence efforts like ULocale, one version of which is at
http://www.inter-locale.com/whitepaper/localeTags.jsp
You'll note that I separate language and region. But I still need a
language tag for the language material (!!)
>
>> If Serbian, Uzbek, and Azeri form the complete list of languages that
>> require some additional registration, then I think we could register
>> these, given some demonstration of need, and move along. Obviously the
>> fudge in that sentence is "some". Mark has "some" justification. You
>> would like "more". Given that no one is likely to research tiny
>> orthographic differences, the justification proposed is that some form
>> of unknown-but-real legacy (computer) differentiation is still a
>> difference.
>
> I know what a language is and what a locale is, and I'm here to judge
> the registration of codes for languages.
Lucky you. On both counts.
>
>> The counter argument appears to be "the computer distinction does not
>> mark a real human-language distinction". The long list of English
>> codes suggests that this argument is actually empty: a country *could*
>> legislate something, but none appear to have done so to the extent
>> that a separate code need be summarily registered *in advance* of the
>> difference appearing. Or am I reading this wrong?
>
>
> I don't understand your "counter argument".
See 1-3 above. I guess the problem is that you want evidence of a
"different actual language" in order to create two more codes to fill
out the pattern of tags, but at the same time:
a) there are examples of not actually different languages that have
already been registered or are extant.
b) there is no close definition of how separate the language has to be
before the languages are "different enough" or "actual enough". It
appears to be a "duck test". So what constitutes a duck?
--
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.
+1 408.962.5487 mailto:aphillips at webmethods.com
-------------------------------------------
Internationalization is an architecture. It is not a feature.
Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws
More information about the Ietf-languages
mailing list