[Fwd: [Fwd]: Response to Mark's message]

Tue Apr 1 22:10:27 CEST 2003

All,

Please find, for the record, my response to Mark's email, which was sent
to the private thread.
-- 
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

+1 408.962.5487  mailto:aphillips at webmethods.com
-------------------------------------------
Internationalization is an architecture. It is not a feature.

Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws

------------
Hi Mark et al,

Thanks for the update. We had a pretty good discussion in Prague, which
has caused me to think back to my original thinking about and discussion
of locale tags and consider your suggestion of using RFC3066 for that
purpose.

I think I have some reservations, which are outlined below my signature.
Some of these come from my discussions of locale tags with various
people on this mail and in other places. In particular, I'm concerned
that similar discussions on the topic of locale tags led me to different
conclusions because I think I understand the requirements differently.
If we can find a public forum to discuss this in, that would be great.

I think that the edge cases your paper discusses for language tags
should not at all be conflated with locale tags, even though locale
tags, by their nature, need to incorporate changes to language
identification. From my parochial perspective, I don't actually "care"
all that much about language tags (other than that they are well
defined, meet their requirements, and so forth) [that is, I care, but
that's not the problem I'm working on.]

I would like to see significant progress made in this area. Lack of a
real standard is hampering progress in building interoperable
multilocale products, especially Web services, IMO.

(Some of) my thoughts are below my signature in this email, but
personally I support Mark's position that an "RFC3066bis" could be used
for locales in all the cases that I currently care very much about.
Other locale variations (that is, those things generally covered by
variants) could be handled by additional attribute values that
complement "locales" or through the use of proper internationalization
of data structures and processing.

Where do we go to talk about this? And where can I catch up with this
thread (is it necessary to do so?)?

Thanks,

Addison

Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

+1 408.962.5487 (phone)  +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture.
It is not a feature.

-*-
My concerns about adding script codes to RFC3066 to form a "locale tag" are:

1. Inserting script codes into the middle of the tag (as opposed to
adding it to the front, Macintosh style, or to the back, variant style)
breaks many existing "locale" implementations that are based on parsing
RFC3066, such as that in Servlet, CultureInfo, etc. I'd rather avoid a
versioning problem, if possible. If it were just Azeri, it wouldn't be
such a big concern, since Azeri has relatively few existing
implementations that would be affected. Chinese is kind of a sticking
point. Are you proposing that script codes would become the de facto way
of writing an RFC3066 tag? How does this mapping play into existing
locale implementations? I'm not sure, but I think it may involve writing
a lot of custom, proprietary code.

2. I don't like the use of xml:lang as a "locale" anything. xml:lang,
IMO, should be strictly a language tag and never-ever, a locale
identifier because it is generally an attribute, has a scope, and
otherwise is mixed up with identifying natural language of content.
Mixing its use violates the idea of not having locale enter into data
structures where it doesn't belong and is just basically confusing. For
reference, the quote is:

http://www.w3.org/TR/REC-xml#sec-lang-tag

When I was playing with the ULocales proposal I started by thinking of
RFC3066-like tags (the idea was something like an "Accept-Locale" for
Web services), and my first thought was that this was too confusing for
the average developer. The reason that ULocales looks the way they do
is, in part, a deliberate move to separate language, region, script, and
variant codes into fields so that you didn't get:

Accept-Language: en-US
Accept-Locale: en-US

But rather something like this:

Accept-Language: en-US
Accept-Locale: lang=en-US:region=US:script=Latn:currency=EUR:...etc...

It also allowed variant transitivity to be kept dealt with in a
reasonable way (by code, rather than by the locale fallback system,
since the script can be applied to the "lang" value repeatedly).

3. IMO, locales are not attributes of data structures. They are more
like attributes of the processing environment. This is  why I chose to
use URN syntax: a URN can be the value of an attribute, but the more
common use is as a tag or link. Use as an attribute can be deprecated
and confusion with xml:lang avoided.

4. There was a lot of dicussion of supra- ("Latin America") and
sub-national ("Kurdistan") locale requirements, as well as cross-locale
preferences. ISO3166 will never help us here. I don't personally think
it matters, but some folks are vociferous about it. The use of country
as a form of locale identification (or language identification) is
somewhat arbitrary.

5. There are folks who are adamant that RFC3066 is not anything but a
language tag. It certainly says that very clearly in a lot of places in
the RFC and in documents that reference it---and I've been flamed by
some of these folks in the past for suggesting ever so gently that the
tags are often construed by others to be the same as a locale (even if
they should not be). If those folks can be convinced to make the tags do
"double duty", I won't object, but acceptance is a consideration.

5a. Some other folks are adamant that "locale" and "language" can be
effectively split in the other direction. That is, that language
identifies nothing except the content language used and that the
"locale" identifies a particular target market and formatting/processing
conventions associated with it. IOW that the purity of locales is being
damaged by their association with a language. I'm not sure I buy (or
even quite get) this argument, but Web site app designers seem
especially fond of it.

6. Finally, we have to be able to explain this clearly and cogently to
non-I18N folks. They have to understand exactly what the semantics are.
My thinking here was originally that a new thing that incorporates 3066
as the language portion and is clearly labeled as a "locale identifier"
would be easier to explain... and that use of 3066 as a "fallback" (e.g.
as Servlet etc. do today) would be permitted, but not encouraged. By
maintaining a strict separation, we can avoid confusion such as "I used
xml:lang on my SOAPFault as the locale for that line item." and other
such nonsense. Again, being somewhat conservative, I like the idea of
just formally declaring RFC3066 as both a locale and language tag (and
then figuring out how to deal with separating the usage on a
case-by-case basis). It's even easier not to define anything new and
just declare what every (non-internationalization person) already
"knows": that RFC3066 is a locale tag.

-- 
Addison P. Phillips
Director, Globalization Architecture
webMethods, Inc.

+1 408.962.5487  mailto:aphillips at webmethods.com
-------------------------------------------
Internationalization is an architecture. It is not a feature.

Chair, W3C I18N WG Web Services Task Force
http://www.w3.org/International/ws