[Fwd]: Response to Mark's message]

Fri Apr 4 13:00:16 CEST 2003

Thanks for your comments.

1. I rather like your idea of using zh.hant; we should test it out on
various software.

2. Your talk at the Prague conference would be very good for background; is
it posted somewhere that people on this list can get to?

Back to your original message:

> 3. IMO, locales are not attributes of data structures. They are more
> like attributes of the processing environment. This is  why I chose to
> use URN syntax: a URN can be the value of an attribute, but the more
> common use is as a tag or link. Use as an attribute can be deprecated
> and confusion with xml:lang avoided.

I'm a little confused by "not attributes of data structures". A simple
reading of that would be that they wouldn't be exchanged in protocols --
which are essentially data structures -- but I'm sure you didn't mean that.
Maybe you meant that "they should not be attributes of text in documents"?
(which I would agree to)

> 4. There was a lot of dicussion of supra- ("Latin America") and
> sub-national ("Kurdistan") locale requirements, as well as cross-locale
> preferences. ISO3166 will never help us here. I don't personally think
> it matters, but some folks are vociferous about it. The use of country
> as a form of locale identification (or language identification) is
> somewhat arbitrary.

Actually, I think there is some progress on that front; Michael Everson can
correct me if I am wrong, but I think they are adding some of the subras.

They have already, in some cases, subs, although they are not as
fine-grained as one might need in the slippery-slope cases (e.g. cities or
counties in the US, for calculating sales tax). And once again, I have
doubts as to whether the more detailed kind of information like counties
belongs in a "locale". It is clearly information that someone may need to
communicate, but it makes "locale" more of a kitchen sink.

Mark
(مرقص بن داود)
________
mark.davis at jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Addison Phillips [wM]" <aphillips at webmethods.com>
To: "Mark Davis" <mark.davis at jtcsv.com>
Cc: "Ietf-languages" <Ietf-languages at alvestrand.no>
Sent: Thursday, April 03, 2003 21:46
Subject: Re: [Fwd]: Response to Mark's message]

> Funny how software degrades like that. Must be time to upgrade again
> ;-). It's the end of the day and my concentration is waning quickly.
> Nonetheless I'd like to send a quick, not entirely coherent response. My
> Mozilla must have whatver your Outlook has.......
>
> >
> > Anyway, I want to emphasize that the issue of how people are using RFC
3066
> > codes is orthogonal to the issue of whether script codes need to be
added to
> > RFC 3066. We need them in any event, simply to distinguish written
> > languages.
>
> I think so too. I think that scripts in language codes should be kept
> clear of "locale tags". Actually, the more I think about this problem,
> the more enamored I become of not even using the word locale and calling
> it "internationalization context" or some such.
>
> >
> > That being said, one of the main problems is that people don't have a
clear
> > notion of what locale codes are supposed to be.
>
> Agreed.
>
> I'm firmly opposed to locales being a form of user preferences. If they
> have any utility whatsoever it is in the idea that they produce some
> sort of predictable behavior (cf. LDML), even if the behaviors are
> broadly defined (at least 24 date/time patterns in each Java
> locale---and more than 40 for a CultureInfo, without even considering
> the calendar mess).
>
> This leaves the question of what a locale actually is and how, if at
> all, it is different than a language.
>
> A case can be made that certain data elements commonly found in locales,
> such as default separators, date/time patterns, and numeric patterns are
> culturally linked rather than linked to a specific language. This is the
> argument that I was presenting in item #6. Aside from a few data points
> like this, and given that 3066 already has regional identifiers embedded
> into it, I think we can safely ignore this argument.
>
> >
> > 1. As far as your points about the location of the script tag, perhaps
you
> > can explain why it would break current software.
>
> A few brief tests of Servlet with "zh-Hant-HK" produces this result from
> "getLocale()": I get a Locale with lang=zh, region=Hant, and variant=HK.
> In other words, it produces Simplified Chinese (the nearest "real"
> locale being "zh"). I suspect that many implementations are equally
> simpleminded, using some form of parsing (in Java, it's probably nothing
> more than a StringTokenizer---in fact I wrote one of these myself about
> five years ago...).
>
> Perhaps we can reframe this issue. Everyone duck, because no one will
> like the following. It's a truly and utterly awful idea....
>
> A case can be made that script variation in RFC3066 defines a separate
> language. There is a precedent for this. It's the Nynorsk/Bokmal pairing
> (I said to duck). If this pair, which represents a "mere" orthographic
> difference merits separate codes, then shouldn't we be able to get
> separate codes for Chinese and Azeri?
>
> I'm not suggesting new 639 codes, but we could do something hokey like:
>
> "zh.hant" and "zh.hans"
>
> or
>
> "az.latn" and "az.cyrl"
>
> On the positive side, it still fools existing software, but the
> "breakage" is different in quality. At least it produces the "null"
> locale and could be "trained" to produce zh-HK. [I'm of the opinion that
> zh-Hant-HK could produce a Java locale zh_HK_Hant or (zh-Hant)_HK.]
>
> >
> > 2. xml:lang. Hard to say. If what is meant by locale is the narrow
meaning
> > above, then it does little harm to transmit that information with
xml:lang.
> > I agree that to depend on that to transmit the broad sense (a) for
locale
> > would be bad.
>
> Maybe I agree on some level. But I have actual cases of how this leads
> to problems and fallacies in development. I'm putting a document
> together on the "aggregation" pattern for WSTF anyway, which
> encapsulates why I react so vehemently to the idea that xml:lang can be
> a "locale". When I post that I'll send a pointer to it, rather than
> noodle around here.
>
> Interestingly, of course, xml:lang does lead, even in my argument
> against it, to a locale object. If you want to do natural language
> processing (think text breaking, for example) on something tagged with
> an xml:lang, it makes sense to use the xml:lang as a locale for that
> purpose. But I digress.
>
> > 3. I really wouldn't want to see anything like your:
> >
> >>Accept-Language: en-US
> >>Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...
> >
> Agreed. We need to figure this mess out in the cleanest possible way. I
> think that is coöpting 3066.
>
> Addison
>
>