[Fwd]: Response to Mark's message]

Addison Phillips [wM] aphillips at webmethods.com
Thu Apr 3 22:46:30 CEST 2003


Funny how software degrades like that. Must be time to upgrade again 
;-). It's the end of the day and my concentration is waning quickly. 
Nonetheless I'd like to send a quick, not entirely coherent response. My 
Mozilla must have whatver your Outlook has.......

> 
> Anyway, I want to emphasize that the issue of how people are using RFC 3066
> codes is orthogonal to the issue of whether script codes need to be added to
> RFC 3066. We need them in any event, simply to distinguish written
> languages.

I think so too. I think that scripts in language codes should be kept 
clear of "locale tags". Actually, the more I think about this problem, 
the more enamored I become of not even using the word locale and calling 
it "internationalization context" or some such.

> 
> That being said, one of the main problems is that people don't have a clear
> notion of what locale codes are supposed to be.

Agreed.

I'm firmly opposed to locales being a form of user preferences. If they 
have any utility whatsoever it is in the idea that they produce some 
sort of predictable behavior (cf. LDML), even if the behaviors are 
broadly defined (at least 24 date/time patterns in each Java 
locale---and more than 40 for a CultureInfo, without even considering 
the calendar mess).

This leaves the question of what a locale actually is and how, if at 
all, it is different than a language.

A case can be made that certain data elements commonly found in locales, 
such as default separators, date/time patterns, and numeric patterns are 
culturally linked rather than linked to a specific language. This is the 
argument that I was presenting in item #6. Aside from a few data points 
like this, and given that 3066 already has regional identifiers embedded 
into it, I think we can safely ignore this argument.

> 
> 1. As far as your points about the location of the script tag, perhaps you
> can explain why it would break current software. 

A few brief tests of Servlet with "zh-Hant-HK" produces this result from 
"getLocale()": I get a Locale with lang=zh, region=Hant, and variant=HK. 
In other words, it produces Simplified Chinese (the nearest "real" 
locale being "zh"). I suspect that many implementations are equally 
simpleminded, using some form of parsing (in Java, it's probably nothing 
more than a StringTokenizer---in fact I wrote one of these myself about 
five years ago...).

Perhaps we can reframe this issue. Everyone duck, because no one will 
like the following. It's a truly and utterly awful idea....

A case can be made that script variation in RFC3066 defines a separate 
language. There is a precedent for this. It's the Nynorsk/Bokmal pairing 
(I said to duck). If this pair, which represents a "mere" orthographic 
difference merits separate codes, then shouldn't we be able to get 
separate codes for Chinese and Azeri?

I'm not suggesting new 639 codes, but we could do something hokey like:

"zh.hant" and "zh.hans"

or

"az.latn" and "az.cyrl"

On the positive side, it still fools existing software, but the 
"breakage" is different in quality. At least it produces the "null" 
locale and could be "trained" to produce zh-HK. [I'm of the opinion that 
zh-Hant-HK could produce a Java locale zh_HK_Hant or (zh-Hant)_HK.]

> 
> 2. xml:lang. Hard to say. If what is meant by locale is the narrow meaning
> above, then it does little harm to transmit that information with xml:lang.
> I agree that to depend on that to transmit the broad sense (a) for locale
> would be bad.

Maybe I agree on some level. But I have actual cases of how this leads 
to problems and fallacies in development. I'm putting a document 
together on the "aggregation" pattern for WSTF anyway, which 
encapsulates why I react so vehemently to the idea that xml:lang can be 
a "locale". When I post that I'll send a pointer to it, rather than 
noodle around here.

Interestingly, of course, xml:lang does lead, even in my argument 
against it, to a locale object. If you want to do natural language 
processing (think text breaking, for example) on something tagged with 
an xml:lang, it makes sense to use the xml:lang as a locale for that 
purpose. But I digress.

> 3. I really wouldn't want to see anything like your:
> 
>>Accept-Language: en-US
>>Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...
> 
Agreed. We need to figure this mess out in the cleanest possible way. I 
think that is coöpting 3066.

Addison



More information about the Ietf-languages mailing list