[Fwd]: Response to Mark's message]
Addison Phillips [wM]
aphillips at webmethods.com
Thu Apr 3 22:46:30 CEST 2003
Funny how software degrades like that. Must be time to upgrade again
;-). It's the end of the day and my concentration is waning quickly.
Nonetheless I'd like to send a quick, not entirely coherent response. My
Mozilla must have whatver your Outlook has.......
> Anyway, I want to emphasize that the issue of how people are using RFC 3066
> codes is orthogonal to the issue of whether script codes need to be added to
> RFC 3066. We need them in any event, simply to distinguish written
I think so too. I think that scripts in language codes should be kept
clear of "locale tags". Actually, the more I think about this problem,
the more enamored I become of not even using the word locale and calling
it "internationalization context" or some such.
> That being said, one of the main problems is that people don't have a clear
> notion of what locale codes are supposed to be.
I'm firmly opposed to locales being a form of user preferences. If they
have any utility whatsoever it is in the idea that they produce some
sort of predictable behavior (cf. LDML), even if the behaviors are
broadly defined (at least 24 date/time patterns in each Java
locale---and more than 40 for a CultureInfo, without even considering
the calendar mess).
This leaves the question of what a locale actually is and how, if at
all, it is different than a language.
A case can be made that certain data elements commonly found in locales,
such as default separators, date/time patterns, and numeric patterns are
culturally linked rather than linked to a specific language. This is the
argument that I was presenting in item #6. Aside from a few data points
like this, and given that 3066 already has regional identifiers embedded
into it, I think we can safely ignore this argument.
> 1. As far as your points about the location of the script tag, perhaps you
> can explain why it would break current software.
A few brief tests of Servlet with "zh-Hant-HK" produces this result from
"getLocale()": I get a Locale with lang=zh, region=Hant, and variant=HK.
In other words, it produces Simplified Chinese (the nearest "real"
locale being "zh"). I suspect that many implementations are equally
simpleminded, using some form of parsing (in Java, it's probably nothing
more than a StringTokenizer---in fact I wrote one of these myself about
five years ago...).
Perhaps we can reframe this issue. Everyone duck, because no one will
like the following. It's a truly and utterly awful idea....
A case can be made that script variation in RFC3066 defines a separate
language. There is a precedent for this. It's the Nynorsk/Bokmal pairing
(I said to duck). If this pair, which represents a "mere" orthographic
difference merits separate codes, then shouldn't we be able to get
separate codes for Chinese and Azeri?
I'm not suggesting new 639 codes, but we could do something hokey like:
"zh.hant" and "zh.hans"
"az.latn" and "az.cyrl"
On the positive side, it still fools existing software, but the
"breakage" is different in quality. At least it produces the "null"
locale and could be "trained" to produce zh-HK. [I'm of the opinion that
zh-Hant-HK could produce a Java locale zh_HK_Hant or (zh-Hant)_HK.]
> 2. xml:lang. Hard to say. If what is meant by locale is the narrow meaning
> above, then it does little harm to transmit that information with xml:lang.
> I agree that to depend on that to transmit the broad sense (a) for locale
> would be bad.
Maybe I agree on some level. But I have actual cases of how this leads
to problems and fallacies in development. I'm putting a document
together on the "aggregation" pattern for WSTF anyway, which
encapsulates why I react so vehemently to the idea that xml:lang can be
a "locale". When I post that I'll send a pointer to it, rather than
noodle around here.
Interestingly, of course, xml:lang does lead, even in my argument
against it, to a locale object. If you want to do natural language
processing (think text breaking, for example) on something tagged with
an xml:lang, it makes sense to use the xml:lang as a locale for that
purpose. But I digress.
> 3. I really wouldn't want to see anything like your:
Agreed. We need to figure this mess out in the cleanest possible way. I
think that is coöpting 3066.
More information about the Ietf-languages