New Last Call: 'Tags for Identifying Languages' to BCP

Sun Dec 12 19:10:00 CET 2004

>  Date: 2004-12-11 10:48
>  From: "Peter Constable" <petercon at microsoft.com>
>  To: ietf-languages at alvestrand.no, ietf at ietf.org
>  
> > -----Original Message-----
> > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > bounces at alvestrand.no] On Behalf Of Bruce Lilly

> > My comments are in response to the "New Last Call" made on
> > the ietf-announce list.  They are in response to the text which
> > accompanied that new last call and the text of
> > draft-phillips-langtags-08.txt dated November 2002.  The
> > specific claim that accessibility has been a problem was made in
> > the text accompanying the new last call
> 
> I don't know where the statement accompanying the announcement came from,

According to the "New Last Call" issued by the IESG Secretary,
the text is "Author's discussion of drivers for this work".

> You singled out that one point to comment on as though it were the main factor.

I mentioned a matter which was repeatedly indicated as a
factor for existing implementations and with which I
strongly disagree.

There are points with which I do not necessarily disagree,
and there are points with which I have not yet had time to
study in detail, due to the surprise of the announcement
of an impending decision (I do not understand why no
announcement of work on an RFC 3066 replacement was made
to the ietf-822 list, especially as the core Internet
protocols discussed there are affected by this draft),
the shortness of the time before a decision (deadline for
comments was given as 5 Jan 2005), and the impending
holidays.

[regarding the proposed registry vs. internationally-
standardized ISO lists for subtag definitions]
> It is certainly the case that only it should be consulted for determining what sub-tags are valid with what denotation, which was the intent.

That is a problem for existing implementations of RFC 3066
tags, which can obtain official, internationally agreed
descriptions of the codes in two languages.

> By looking in the sub-tag registry. If ISO changed the meaning of "US" to something other than what it is now, its meaning for purposes of use in an IETF language tag would not change, because it would remain stable in the sub-tag registry. You would be fairly well protected against the whim of politicians.

OK, continuing your hypothetical example and its relationship
to language, suppose that there is another civil war and
that what now corresponds to "US" is split into Blue America
and Red America.  Further suppose that in due course ISO
assigns some other code to one of those countries and retains
"US" for the other, and that that happens after the proposed
registry is set up with a definition for "US" and some
description referring to the "old" use.  Now suppose that one
wishes to produce an appropriate language tag for the text
"moral values" (which clearly has different meaning in Blue
America (telling the truth, admitting to mistakes, etc.) and
in Red America (imposing totalitarian control over others)).
How specifically would the proposed registry handle such a
change in the meaning of "US", and how would the registry
help differentiate the meaning of a 1990's "en-us" tag to
that of the hypothetical time described?

I suspect that it won't help, and I recommend review of
how another artifact of politics (viz. time zones) are
handled by the (unofficial) database of time zones
maintained at ftp://elsie.nci.hih.gov/pub/tzdata2004g.tar.gz.
The format used handles multiple changes in definitions
that went into effect at different times, something that
the proposed registry doesn't appear to handle.

> > > But if the proposed new registry's description of "CS" says
> > "foo" and the ISO standard code list says "bar", what's
> > an implementor supposed to present to a user as *the*
> > description associated with "CS"?
> 
> The *meaning* of the sub-tag is determined by the sub-tag registry. If you want human-readable descriptors,

The draft says that the proposed registry will contain a
description, in English (only).

> you already have to look beyond the ISO standards for anything more than English and French

But existing RFC 3066 implementations can get official
descriptions in *both* of those languages; the proposal
would adversely affect those existing implementations by
eliminating the French description.

Of course, it is a more serious defect of the proposal
that it would fail to reflect internationally-agreed
codes and would fail to keep pace with changes...

> it would not be new that you have to look beyond the registry itself to decide what human-readable descriptors you should provide in a product.  

It would be new that one could not find a standard
(i.e. official) French-language description in the
list of codes.

> > One possibility would be two description fields.  But the
> > registry would need a charset closer to ISO-8859-1 than
> > to ANSI X3.4 as currently specified.  Or an encoding
> > scheme.
> 
> Personally, I don't see the value in something like that. Given the intent to have a registry that can be machine-readable, changing its charset from ANSI X3.4 in order to gain descriptors in just one more language is not worth it IMO. 

Fine, use utf-8, which encompasses ANSI X3.4 and
ISO-8859-1 (plus others).  The point is that ANSI
X3.4 is inadequate.

> Speaking at least for Microsoft, we're interested in having descriptors in far more than two languages, and we certainly would not blindly base the descriptors we present to our customers solely on what a registry provides, no matter what its charset.

Surely in going from two (the current situation per
RFC 3066) to "more than two" indicates that decreasing
to one (as in the draft proposal) is heading in the
wrong direction.  It certainly invalidates the claim
that the proposal is compatible with existing
implementations, at least one of which does make use
of the descriptions currently provided in both
languages in the ISO lists specified by RFC 3066.

> As you know, the 'grandfathered' production is loose in the ABNF given in the draft, but is very tightly constrained elsewhere in the draft: it is limited to only items registered under RFC 1766 or RFC 3066 up to the date of acceptance of this proposed spec. (In fact, only a subset of those, all explicitly identified in the sub-tag registry.) On the date of acceptance, you will be able to know precisely what the valid tags that fit under the 'grandfathered' production are and will forever be, and it is 100% guaranteed that none of them will have any of the forms that seem to concern you.

I see no guarantee that a future revision won't use
such forms on the basis that they are permitted
by the ABNF.