[Fwd]: Response to Mark's message]

Mark Davis mark.davis at jtcsv.com
Wed Apr 2 22:08:37 CEST 2003

The dog ate my homework. (Actually Outlook Express died while I was in the
middle of composition. Very annoying; it used to be pretty solid, but is now
fairly unstable -- perhaps it was one of the weekly Windows security

Anyway, I want to emphasize that the issue of how people are using RFC 3066
codes is orthogonal to the issue of whether script codes need to be added to
RFC 3066. We need them in any event, simply to distinguish written

That being said, one of the main problems is that people don't have a clear
notion of what locale codes are supposed to be.
a. On one extreme, locale codes are to be used to encapsulate any sorts of
preferences that some user might have: language, currency, timezone, country
of origin, country of residency, county/city (for sales tax calculations),
religous preference, sexual preference(s), seating assignment, etc. To add
structure to RFC 3066 to encompass any of these would be horrible.
b. On the other extreme, locale codes are pretty much restricted to
language; one might guess at certain defaults from the language, but those
would only be guesses. If that is the case, then there is little harm in
using RFC 3066 to indicate this narrow form of 'locale', since it is
essentially just language.

1. As far as your points about the location of the script tag, perhaps you
can explain why it would break current software. It is understandable that
people that have 'extended' RFC 3066 because it didn't cover what they
needed (Windows, Mac) would need to map between the RFC 3066bis and what
they have now. (Amusingly -- unless you have to deal with it! -- Windows
even has two completely different sets of 3-letter language tags, for
CultureInfo and OpenType, that even conflict: look at
http://oss.software.ibm.com/icu/dropbox/CultureInfo_OpenType.xls; ARI means
Iraqi Arabic in one and the Aari language in the other, ugg). So for some
cases, somebody will have to map. But the issue is whether existing software
that got something of the form "zh-Hani-HK" would actually break. Perhaps
you can give some examples from your experience.

2. xml:lang. Hard to say. If what is meant by locale is the narrow meaning
above, then it does little harm to transmit that information with xml:lang.
I agree that to depend on that to transmit the broad sense (a) for locale
would be bad.
3. I really wouldn't want to see anything like your:
> Accept-Language: en-US
> Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...
since it would open up the possibility of combinations like:
> Accept-Language: de-De
> Accept-Locale: lang=en-US:region=FR:script=Latn:currency=EUR:...etc...
Where I would have no cotton-picking idea of how to interpret this in a
consistent fashion. Far better to transmit extra information aside from the
Language; or disallow the possibility of transmitting both.
(I'll have to continue the rest of the message later)

mark.davis at jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Addison Phillips [wM]" <aphillips at webmethods.com>
To: "Ietf-languages" <Ietf-languages at alvestrand.no>
Sent: Tuesday, April 01, 2003 21:10
Subject: [Fwd: [Fwd]: Response to Mark's message]

> All,
> Please find, for the record, my response to Mark's email, which was sent
> to the private thread.
> --
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> +1 408.962.5487  mailto:aphillips at webmethods.com
> -------------------------------------------
> Internationalization is an architecture. It is not a feature.
> Chair, W3C I18N WG Web Services Task Force
> http://www.w3.org/International/ws
> ------------
> Hi Mark et al,
> Thanks for the update. We had a pretty good discussion in Prague, which
> has caused me to think back to my original thinking about and discussion
> of locale tags and consider your suggestion of using RFC3066 for that
> purpose.
> I think I have some reservations, which are outlined below my signature.
> Some of these come from my discussions of locale tags with various
> people on this mail and in other places. In particular, I'm concerned
> that similar discussions on the topic of locale tags led me to different
> conclusions because I think I understand the requirements differently.
> If we can find a public forum to discuss this in, that would be great.
> I think that the edge cases your paper discusses for language tags
> should not at all be conflated with locale tags, even though locale
> tags, by their nature, need to incorporate changes to language
> identification. From my parochial perspective, I don't actually "care"
> all that much about language tags (other than that they are well
> defined, meet their requirements, and so forth) [that is, I care, but
> that's not the problem I'm working on.]
> I would like to see significant progress made in this area. Lack of a
> real standard is hampering progress in building interoperable
> multilocale products, especially Web services, IMO.
> (Some of) my thoughts are below my signature in this email, but
> personally I support Mark's position that an "RFC3066bis" could be used
> for locales in all the cases that I currently care very much about.
> Other locale variations (that is, those things generally covered by
> variants) could be handled by additional attribute values that
> complement "locales" or through the use of proper internationalization
> of data structures and processing.
> Where do we go to talk about this? And where can I catch up with this
> thread (is it necessary to do so?)?
> Thanks,
> Addison
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> +1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
> -------------------------------------------------
> Internationalization is an architecture.
> It is not a feature.
> -*-
> My concerns about adding script codes to RFC3066 to form a "locale tag"
> 1. Inserting script codes into the middle of the tag (as opposed to
> adding it to the front, Macintosh style, or to the back, variant style)
> breaks many existing "locale" implementations that are based on parsing
> RFC3066, such as that in Servlet, CultureInfo, etc. I'd rather avoid a
> versioning problem, if possible. If it were just Azeri, it wouldn't be
> such a big concern, since Azeri has relatively few existing
> implementations that would be affected. Chinese is kind of a sticking
> point. Are you proposing that script codes would become the de facto way
> of writing an RFC3066 tag? How does this mapping play into existing
> locale implementations? I'm not sure, but I think it may involve writing
> a lot of custom, proprietary code.
> 2. I don't like the use of xml:lang as a "locale" anything. xml:lang,
> IMO, should be strictly a language tag and never-ever, a locale
> identifier because it is generally an attribute, has a scope, and
> otherwise is mixed up with identifying natural language of content.
> Mixing its use violates the idea of not having locale enter into data
> structures where it doesn't belong and is just basically confusing. For
> reference, the quote is:
> http://www.w3.org/TR/REC-xml#sec-lang-tag
> When I was playing with the ULocales proposal I started by thinking of
> RFC3066-like tags (the idea was something like an "Accept-Locale" for
> Web services), and my first thought was that this was too confusing for
> the average developer. The reason that ULocales looks the way they do
> is, in part, a deliberate move to separate language, region, script, and
> variant codes into fields so that you didn't get:
> Accept-Language: en-US
> Accept-Locale: en-US
> But rather something like this:
> Accept-Language: en-US
> Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...
> It also allowed variant transitivity to be kept dealt with in a
> reasonable way (by code, rather than by the locale fallback system,
> since the script can be applied to the "lang" value repeatedly).
> 3. IMO, locales are not attributes of data structures. They are more
> like attributes of the processing environment. This is  why I chose to
> use URN syntax: a URN can be the value of an attribute, but the more
> common use is as a tag or link. Use as an attribute can be deprecated
> and confusion with xml:lang avoided.
> 4. There was a lot of dicussion of supra- ("Latin America") and
> sub-national ("Kurdistan") locale requirements, as well as cross-locale
> preferences. ISO3166 will never help us here. I don't personally think
> it matters, but some folks are vociferous about it. The use of country
> as a form of locale identification (or language identification) is
> somewhat arbitrary.
> 5. There are folks who are adamant that RFC3066 is not anything but a
> language tag. It certainly says that very clearly in a lot of places in
> the RFC and in documents that reference it---and I've been flamed by
> some of these folks in the past for suggesting ever so gently that the
> tags are often construed by others to be the same as a locale (even if
> they should not be). If those folks can be convinced to make the tags do
> "double duty", I won't object, but acceptance is a consideration.
> 5a. Some other folks are adamant that "locale" and "language" can be
> effectively split in the other direction. That is, that language
> identifies nothing except the content language used and that the
> "locale" identifies a particular target market and formatting/processing
> conventions associated with it. IOW that the purity of locales is being
> damaged by their association with a language. I'm not sure I buy (or
> even quite get) this argument, but Web site app designers seem
> especially fond of it.
> 6. Finally, we have to be able to explain this clearly and cogently to
> non-I18N folks. They have to understand exactly what the semantics are.
> My thinking here was originally that a new thing that incorporates 3066
> as the language portion and is clearly labeled as a "locale identifier"
> would be easier to explain... and that use of 3066 as a "fallback" (e.g.
> as Servlet etc. do today) would be permitted, but not encouraged. By
> maintaining a strict separation, we can avoid confusion such as "I used
> xml:lang on my SOAPFault as the locale for that line item." and other
> such nonsense. Again, being somewhat conservative, I like the idea of
> just formally declaring RFC3066 as both a locale and language tag (and
> then figuring out how to deal with separating the usage on a
> case-by-case basis). It's even easier not to define anything new and
> just declare what every (non-internationalization person) already
> "knows": that RFC3066 is a locale tag.
> --
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
> +1 408.962.5487  mailto:aphillips at webmethods.com
> -------------------------------------------
> Internationalization is an architecture. It is not a feature.
> Chair, W3C I18N WG Web Services Task Force
> http://www.w3.org/International/ws
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

More information about the Ietf-languages mailing list