New Last Call: 'Tags for Identifying Languages' to BCP

Sun Dec 12 21:31:56 CET 2004

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Bruce Lilly

> Moreover, the point is that countries do change, and that use
> of country codes (as provided for in RFC 3066 and in the
> proposed draft) carries with it the inherent instability
> which is characteristic of politics.  A quest for "stability"
> of countries seems Quixotic and oxymoronic.  According to the
> principle of stability as that term is used in defense of the
> draft, I suppose we're all intended to refer to Malawi as
> "Rhodesia" because that's what it (in part) was called 50 years
> ago, or that we're supposed to ignore the breakup of the USSR,
> Yugoslavia, etc., the reunification of Germany, etc.

That is not at all the aim here wrt stability; rather, the aim is that a
symbolic identifier used for metadata in IT systems not change because
some government on a whim says, "We would now prefer to use 'yz' rather
than 'xy' to designate our country."

Sure, there will be changes that we need to deal with; but there's no
reason to subject all implementations, users and data to changes that
are purely cosmetic changes to things that are not designed to be read
by humans.

> A related problem with the use of country codes in language
> tags is that there is not necessarily an inherent relationship
> between language and country borders.

That is not what country IDs within a language tag is intended to
suggest. In fact, if there were inherent relationships, we probably
would never have needed to use country IDs in a language tag.

> The borders of Germany
> have changed many, many times.  If one is referring to the
> German language as spoken by inhabitants of Alsace, using
> country codes would imply that that same language spoken by
> the same people would have been tagged at various times as
> de-DE and de-FR according to where the France-Germany border
> happened to have been determined by politicians of the time.
> That strikes me as being a rather silly way to tag language,
> but that's the precedent set by RFC 1766.

I agree that that's a silly way to tag that language; I disagree that
RFC 1766 suggests I should tag it that way. 

> As far as I can tell,
> the draft doesn't really deal with the issue of changing borders
> or changing country names -- it merely pretends that these
> things don't happen by attempting to declare a snapshot of the
> status at some point in time as being valid for all time.

That may be your reading of the situation, but it is not how it is seen
by those of us who have been working on this spec and examining these
issues closely.

> But the user has indicated that he speaks French, and the
> proposed registry contains a description in English only.
> Where is the implementor supposed to get the *official*
> translation for display?  N.B. under the current (RFC 3066)
> situation, the definitive ISO lists provide an official
> description in French.

Neither RFC 1766 or RFC 3066 has ever presented "official" translations;
this is no different for RFC 3066bis. Under RFC 3066, one is pointed to
ISO 639-1 and ISO 639-2 to get the alpha-2 and alpha-3 IDs, but it does
not anywhere state that implementors should use the English and French
language names in those ISO standards; exactly the same situation holds
for RFC 3066bis. (Note, btw, that the names listed by ISO 639-1/-2 have
no particular "official" status; they are normative in those standards
to the extent that the indicate what language variety a given ID
denotes, but they do not claim that the particular form of the language
names have any particular status.) 

> > > One possibility would be two description fields.
> >
> > Why two?
> 
> There are now two in the ISO lists (and, as noted, in the
> UN list).  I have no objection to more, but I object to
> a reduction.

If anything, I am inclined to object to two: to avoid an Anglo-Franco
colonial bias, either there is one name that is simply a reference name,
or the registry be designed so that it could accommodate names in as
many languages as may be available. 

Note that the RFC 3066 specifies a registry that does not include French
language names. I suggest that this issue should be dropped.

> I have an implementation which (in accordance with RFC 3066)
> uses the official ISO lists. It has provision for displaying
> ISO 639 language tags with their descriptions in either of the
> two languages supported by the official 639 lists, and likewise
> for the ISO 3166 country codes.  

RFC 3066 *does not at any point* suggest let alone state that
implementations should use ISO 639 language names or ISO 3166 country
names for UI purposes. IMO, you are creating an issue where none exists.

> The specification of the
> draft is *NOT* compatible with that existing implementation
> because it removes the existing functionality of official
> descriptions in French of language and country codes. As a
> result of that incompatibility,  the newly proposed
> specification does not work with (at least that one)
> existing implementation (but I agree that that is a crucial
> concern).

Display names for languages and countries are not within the scope of
RFC 1766 or RFC 3066. It is preposterous to suggest that this draft is
not compatible with existing implementations of RFC 3066 on that basis.

> > There are 6000 languages spoken on Earth, of which
> > perhaps 600 have a standard written form.
> 
> ISO 639 lists about 650, not precisely 6000.

Between ISO 639-1 and ISO 639-2, there are less than 400 individual
languages listed. The number 6000 was given as a rough figure, and it is
fairly well known that the number of living languages is on that order.
ISO 639-3 will list over 7000 different individual languages.

> It might be worthwhile considering the differences in the
> way languages tags are used, by whom they are used, and for
> what purpose.  There may well be a substantial difference
> between use of a tag to represent an obscure dialect of a
> dead language in a research paper vs. tagging a piece of
> text in one of the core Internet protocols such as SMTP.
> The draft seems to ignore the needs of the core Internet
> protocols (e.g. unbounded tag length which is incompatible
> with those protocols).

IETF language tags are used in a wide variety of applications. The
parties involved in development of this spec (the authors and others)
have examined these issues for the past several years and have arrived
at this architecture.

> > What is supposed to
> > be privileged about English and French?
> 
> They happen to be the languages in which international
> standards (q.v. the ISO and UN lists) are published.

That is true for ISO standards because the official languages of ISO are
English and French. (Russian is also an official language of ISO, but is
not required.) But this spec is not an ISO standard; it is an IETF
standard. If you can point to IETF requirements that IETF specs must
contain English and French, then that would be a legitimate concern. But
you are simply adding localization requirements to a spec for i18n
infrastructure, and I consider that not at all appropriate.

> > > ABNF from the draft:
> >
> > You're technically right, but your underlying claim (that RFC 3066
tags
> are
> > bounded in length) is false, as has been shown
> 
> One part of my claim is that non-private-use RFC 3066 tags
> up to the present time are no longer than 11 octets in length.

Only co-incidently at the present time.

> As the draft, if/when approved, would close that registration
> process, that limit (unless a longer tag is registered in
> the interim) would apply for all time. 

And so that limit would be a constraint applying for all time to the
'grandfathered' production which concerned you so much.

Peter Constable
Microsoft Corporation