LANGUAGE TAG REGISTRATION FORM

Mark Davis mark.davis at jtcsv.com
Thu Apr 10 18:02:13 CEST 2003


Please refrain from jumping to conclusions, and *SHOUTING*. This is not an
issue with ICU being broken, and I never said it was. Let's try this point
by point.

> >  > An even more general concern: Is there anything in az-latn-az that
> >>  is different from az-latn (and same for Cyrillic)? In other words,
> >>  do we need az-latn-az (and similar for uz and sp) at all?

1. ICU would like to use RFC-3066 codes to distinguish languages.
2. There are major systems that distinguish written languages on the above
basis (with the country code for orthographies), systems that ICU must
interwork with.
3. For ICU to be able to do (1), the RFC-3066 codes have to be able to also
distinguish the languages in (2).
4. Using the country code to distinguish orthographies is *not* a new
concept; RFC-3066 permits that with *any* combination of ISO 639 code.
5. Why should the Azeri with Latin script be permitted fewer distinctions
than English or other languages?

The only reason why ICU to be broken would be if RFC-3066 did not allow it
to make the distinctions it needs to in order to interoperate. The only
other choice would be for ICU itself to used codes "based on", but not
identical to RFC 3066, and leave others to deal with the differences. Of
course, we would prefer not to do that.

> (I like ICU, but I find it clumsy and difficult to navigate and hard
> to file bugs for and my biggest worry is that there is no information
> about the authority for any of its entries.)

The first bit is kind of you to say. That you find it hard to navigate is
not surprising; it is a library of code, not an end-user product.

As to the authority for its entries: the data was originally compiled by IBM
internal country sources. We then cross-checked it against other platforms,
and asked internal IBM country sources to make an acceptable choice. (Often
you get much better information when you ask "Which of A, B, or C is best?",
rather than asking "Is A right?")  You can see examples of comparison data
on http://oss.software.ibm.com/cvs/icu/~checkout~/locale/all_diff_xml/ for
different languages. However, this is only for the data from the top-tier
countries; other data is marked as experimental.

We don't have provision for tagging each bit of data with its ultimate
source, although there are CVS comments about different changes over the
history of the data if you are curious. I doubt that you will find even that
much from most systems with such data; try finding out the source of Windows
or Solaris data. LDML
(http://oss.software.ibm.com/cvs/icu/~checkout~/locale/locale_data_markup.ht
ml) does have provision for tagging data with standards; we will see how
much that capability is used.

(And don't get me started about the ISO TR 14652 or ISO/IEC 15897
repository; you know my opinion of the quality of that data.)

Mark
(مرقص بن داود)
________
mark.davis at jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message -----
From: "Michael Everson" <everson at evertype.com>
To: <ietf-languages at iana.org>
Sent: Thursday, April 10, 2003 14:01
Subject: Re: LANGUAGE TAG REGISTRATION FORM


> At 13:13 -0700 2003-04-10, Mark Davis wrote:
> >  > An even more general concern: Is there anything in az-latn-az that
> >>  is different from az-latn (and same for Cyrillic)? In other words,
> >>  do we need az-latn-az (and similar for uz and sp) at all?
> >
> >I know of at least one prominent system that distinguishes them in
practice,
> >so we (ICU) need to maintain that distinction.
>
> *THAT* kind of statement is not good enough for me. These are
> language tags. If there is no difference between az-Latn and
> az-Latn-AZ then two tags will not be registered. If ICU is broken,
> you must fix it. I do not want us to kludge 3066 or 15924 to cater
> for the difficulties of particular implementations.
>
> (I like ICU, but I find it clumsy and difficult to navigate and hard
> to file bugs for and my biggest worry is that there is no information
> about the authority for any of its entries.)
> --
> Michael Everson * * Everson Typography *  * http://www.evertype.com
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>



More information about the Ietf-languages mailing list