Review of draft-phillips-langtags-03

Thu Jul 1 21:04:53 CEST 2004

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Mike Ksar

> Any future extensions, such as allowing/encouraging private-use "q"
> tags, private-use "x" tags and the "x-" singleton, should not be part
of
> the conformance clause for this RFC. 

I'd like to elaborate on concerns we have about the private-use aspects
of this draft.

A recommendation ("SHOULD") is given to use the identifier ranges
defined in ISO 639-2 (viz., qaa - qtz), ISO 15924 (viz., Qaaa - Qabx)
and ISO 3166 (viz. AA, QM - QZ, XA - XZ, ZZ) for private-use tags. The
rational given (in email on this list, not in the draft) is that e.g.
"en-Qdej is more informative than x-qwertyuiop". 

The question is whether private-use tags constructed as recommended are
more informative than x-... tags *in a useful way*.

For cases like en-Qabc vs. x-qwertyuiop, it can be useful to know that
the variety in question is English. Similarly for en-QM vs.
x-qwertyuiop. In a case like qaa or qaa-Latn or qaa-US vs. x-qewrtyuiop,
however, the added information is not particularly helpful: apart from
prior agreement, there is absolutely no clue as to what documents could
be returned or what resources might be applicable.

I think it is far more likely that someone will be using a private-use
ID in reference to the language than they would be to script or region.
(Note: I consider private-use variants, such as en-US-x-pepsi, an
entirely different matter.) For instance, it's more likely someone will
want to tag data as being in the Martian language than to tag data as
being in some familiar language but using the Martian script or being
the variety spoken or spelled on Mars. Therefore, I think it more
important to consider what's appropriate for privately-defined language
IDs than privately-defined script or region IDs.

Given that, since the recommendation made is not particularly useful in
the case of languages, I question the usefulness of the recommendation
overall.

Moreover, I see a particular problem with regard to languages in a
particular usage scenario: Suppose a platform vendor provides means for
users to create their own locale/culture entities, and part of the
definition includes an RFC 3066bis language tag. Indeed, such a tag may
be getting used as the unique ID for the locale. Now imagine users
creating locale definitions for languages such as Mohave or Pomo, for
which there is no ISO 639 identifier. Following the recommendation, user
A creates a custom culture for Mohave, and there's a high likelihood
that she'll take the first local-use ID, qaa. User B creates a custom
locale for Pomo, and will very likely use exactly the same ID, qaa. If
they try to exchange their locale bundles, then, the likelihood of a
conflict is high. In a slightly different scenario in which User A
creates a custom locale and then subsequently system admin B pushes a
different custom locale onto A's system, the risks involved could be
rather worse. In contrast, if locales are named using the x-... syntax,
then different locales will most likely have different names (e.g.
x-Mohave and x-Pomo). As a result, the likelihood of a conflict is
minimal, and the risks involved if a conflict does occur are small (two
locales named "x-Mohave" are likely to be only marginally, if at all,
different).

Therefore, I and my Microsoft colleagues would prefer to see the
"SHOULD" clauses regarding private-use removed.

Peter

Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division