[Fwd]: Response to Mark's message]

Fri Apr 4 14:12:16 CEST 2003

Mark Davis wrote:

>Thanks for your comments.
>
>1. I rather like your idea of using zh.hant; we should test it out on
>various software.
>
>2. Your talk at the Prague conference would be very good for background; is
>it posted somewhere that people on this list can get to?
>

At the moment neither the slides nor the document are posted. I'll put 
them on my personal website and post the URL.

>
>
>Back to your original message:
>
>>3. IMO, locales are not attributes of data structures. They are more
>>like attributes of the processing environment. This is  why I chose to
>>use URN syntax: a URN can be the value of an attribute, but the more
>>common use is as a tag or link. Use as an attribute can be deprecated
>>and confusion with xml:lang avoided.
>>
>
>I'm a little confused by "not attributes of data structures". A simple
>reading of that would be that they wouldn't be exchanged in protocols --
>which are essentially data structures -- but I'm sure you didn't mean that.
>Maybe you meant that "they should not be attributes of text in documents"?
>(which I would agree to)
>

The latter. They aren't attributes of text (or numbers or dates or ...). 
IOW a locale >>should<< not be used as a modifier to a data structure. 
It, of course, can itself be exchanged or made to be a field in a data 
structure. But to say that, for example, an "object of type 'x' has a 
locale of 'y'" seems like (at best) very poor internationalization... I 
almost don't care what the object is (the exception being a locale data 
file, I guess).

>
>
>>4. There was a lot of dicussion of supra- ("Latin America") and
>>sub-national ("Kurdistan") locale requirements, as well as cross-locale
>>preferences. ISO3166 will never help us here. I don't personally think
>>it matters, but some folks are vociferous about it. The use of country
>>as a form of locale identification (or language identification) is
>>somewhat arbitrary.
>>
>
>Actually, I think there is some progress on that front; Michael Everson can
>correct me if I am wrong, but I think they are adding some of the subras.
>
>They have already, in some cases, subs, although they are not as
>fine-grained as one might need in the slippery-slope cases (e.g. cities or
>counties in the US, for calculating sales tax). And once again, I have
>doubts as to whether the more detailed kind of information like counties
>belongs in a "locale". It is clearly information that someone may need to
>communicate, but it makes "locale" more of a kitchen sink.
>

I don't want to enter into the sloppy world of "fine grained locales". I 
mention the supra/sub issue as one that frequently resurfaces when 
speaking of locales in the abstract, but the actual use of one of these 
structures is generally tied to one's implementation and impl decisions. 
Locales are somewhat sloppy references to start with. Adding additional 
levels of ficticious specificity is a waste of time, especially since 
existing environments will, at best, generally ignore the extra values.

Now I can understand why these distinctions would be of interest for the 
purpose of creating language identifiers. I think that the plans for 
these kinds of distinctions should focus on that aspect.

To go back to the starting point here, if (a) RFC3066 were changed to 
resolve the majority of variant problems with locale identifiers and (b) 
were then opened to being used as locale identifiers, I think we would 
find that there would still be the need for certain 'de facto' practices 
in assigning meaning to the tags. An existing de facto practice is that 
of associating Trad. Chinese with zh-TW, The question is my mind is 
whether any additional "variant forms" need incorporation in order to 
make RFC3066 suitable long term to use as a locale tag (and are these 
modifications desirable) or whether we need to say, clearly and loudly, 
that additional variant forms represent separate "international context" 
that should not be expected from a lowly locale identifier.

Examples:
 -- POSIX @EURO locales (for the purpose of modifying default currency)
 -- @EURO locales (for the purpose of switching default 
currency/currency format--one time event)  
 -- default paper size as in Linux style POSIX locales (LC_PAPERSIZE)
 -- default measuring system (SI vs. the USA)
 -- and so on.

Best Regards,

Addison

>
>
>Mark
>(مرقص بن داود)
>________
>mark.davis at jtcsv.com
>IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
>(408) 256-3148
>fax: (408) 256-0799
>
>----- Original Message -----
>From: "Addison Phillips [wM]" <aphillips at webmethods.com>
>To: "Mark Davis" <mark.davis at jtcsv.com>
>Cc: "Ietf-languages" <Ietf-languages at alvestrand.no>
>Sent: Thursday, April 03, 2003 21:46
>Subject: Re: [Fwd]: Response to Mark's message]
>
>
>>Funny how software degrades like that. Must be time to upgrade again
>>;-). It's the end of the day and my concentration is waning quickly.
>>Nonetheless I'd like to send a quick, not entirely coherent response. My
>>Mozilla must have whatver your Outlook has.......
>>
>>>Anyway, I want to emphasize that the issue of how people are using RFC
>>>
>3066
>
>>>codes is orthogonal to the issue of whether script codes need to be
>>>
>added to
>
>>>RFC 3066. We need them in any event, simply to distinguish written
>>>languages.
>>>
>>I think so too. I think that scripts in language codes should be kept
>>clear of "locale tags". Actually, the more I think about this problem,
>>the more enamored I become of not even using the word locale and calling
>>it "internationalization context" or some such.
>>
>>>That being said, one of the main problems is that people don't have a
>>>
>clear
>
>>>notion of what locale codes are supposed to be.
>>>
>>Agreed.
>>
>>I'm firmly opposed to locales being a form of user preferences. If they
>>have any utility whatsoever it is in the idea that they produce some
>>sort of predictable behavior (cf. LDML), even if the behaviors are
>>broadly defined (at least 24 date/time patterns in each Java
>>locale---and more than 40 for a CultureInfo, without even considering
>>the calendar mess).
>>
>>This leaves the question of what a locale actually is and how, if at
>>all, it is different than a language.
>>
>>A case can be made that certain data elements commonly found in locales,
>>such as default separators, date/time patterns, and numeric patterns are
>>culturally linked rather than linked to a specific language. This is the
>>argument that I was presenting in item #6. Aside from a few data points
>>like this, and given that 3066 already has regional identifiers embedded
>>into it, I think we can safely ignore this argument.
>>
>>>1. As far as your points about the location of the script tag, perhaps
>>>
>you
>
>>>can explain why it would break current software.
>>>
>>A few brief tests of Servlet with "zh-Hant-HK" produces this result from
>>"getLocale()": I get a Locale with lang=zh, region=Hant, and variant=HK.
>>In other words, it produces Simplified Chinese (the nearest "real"
>>locale being "zh"). I suspect that many implementations are equally
>>simpleminded, using some form of parsing (in Java, it's probably nothing
>>more than a StringTokenizer---in fact I wrote one of these myself about
>>five years ago...).
>>
>>Perhaps we can reframe this issue. Everyone duck, because no one will
>>like the following. It's a truly and utterly awful idea....
>>
>>A case can be made that script variation in RFC3066 defines a separate
>>language. There is a precedent for this. It's the Nynorsk/Bokmal pairing
>>(I said to duck). If this pair, which represents a "mere" orthographic
>>difference merits separate codes, then shouldn't we be able to get
>>separate codes for Chinese and Azeri?
>>
>>I'm not suggesting new 639 codes, but we could do something hokey like:
>>
>>"zh.hant" and "zh.hans"
>>
>>or
>>
>>"az.latn" and "az.cyrl"
>>
>>On the positive side, it still fools existing software, but the
>>"breakage" is different in quality. At least it produces the "null"
>>locale and could be "trained" to produce zh-HK. [I'm of the opinion that
>>zh-Hant-HK could produce a Java locale zh_HK_Hant or (zh-Hant)_HK.]
>>
>>>2. xml:lang. Hard to say. If what is meant by locale is the narrow
>>>
>meaning
>
>>>above, then it does little harm to transmit that information with
>>>
>xml:lang.
>
>>>I agree that to depend on that to transmit the broad sense (a) for
>>>
>locale
>
>>>would be bad.
>>>
>>Maybe I agree on some level. But I have actual cases of how this leads
>>to problems and fallacies in development. I'm putting a document
>>together on the "aggregation" pattern for WSTF anyway, which
>>encapsulates why I react so vehemently to the idea that xml:lang can be
>>a "locale". When I post that I'll send a pointer to it, rather than
>>noodle around here.
>>
>>Interestingly, of course, xml:lang does lead, even in my argument
>>against it, to a locale object. If you want to do natural language
>>processing (think text breaking, for example) on something tagged with
>>an xml:lang, it makes sense to use the xml:lang as a locale for that
>>purpose. But I digress.
>>
>>>3. I really wouldn't want to see anything like your:
>>>
>>>>Accept-Language: en-US
>>>>Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...
>>>>
>>Agreed. We need to figure this mess out in the cleanest possible way. I
>>think that is coöpting 3066.
>>
>>Addison
>>
>>
>

-- 

Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

Internationalization is an architecture. It is not a feature.

[Chair, W3C-I18N-WG, Web Services Task Force]
http://www.w3.org/International/ws