New Last Call: 'Tags for Identifying Languages' to BCP

Mon Dec 13 01:35:04 CET 2004

>  Date: 2004-12-12 15:34
>  From: John Cowan <jcowan at reutershealth.com>
>  To: ietf-languages at alvestrand.no, ietf at ietf.org

> Of course countries change, and then the numeric country codes change
> as well.  The point is that the alpha codes change for political reasons
> when there has been *no* change in the underlying country:  Romania's
> 3-alpha code changed from ROM to ROU without any change in Romania at all.
> The CS case is particularly gratuitous, as its denotation changed from
> "Czechoslovakia" (a no longer existent country) to "Serbia and Montenegro"
> (a newly created country).

There is a limited supply of 2-letter codes and the supply
of 3-digit codes is only slightly greater.  Reassignment of
codes from such a limited supply is inevitable.  Better to
deal with the fact of tides than to try to command the tide
not to flow in.

> > As far as I can tell,
> > the draft doesn't really deal with the issue of changing borders
> > or changing country names -- it merely pretends that these
> > things don't happen by attempting to declare a snapshot of the
> > status at some point in time as being valid for all time.
> 
> No, it attempts to freeze the code-to-country mapping at a single
> point.  New countries or changes in old countries should involve only the
> additions of codes, not the reuse of old codes.

Too late. King Canute commands the tide not to come in, but
his feet still get wet.  Better to deal with such change
appropriately rather than commanding countries (or
international standards bodies) not to change.

> I don't know.  Where is the implementor supposed to get the
> official German, or Catalan, or Mandarin translations?
> Not in the ISO registry, for sure.  To say nothing of the
> cases where no official translations exist.

But I'm not concerned with translations, but with the
definitions. And currently the definitions are available
in French and English.

> > It might be worthwhile considering the differences in the
> > way languages tags are used, by whom they are used, and for
> > what purpose.  There may well be a substantial difference
> > between use of a tag to represent an obscure dialect of a
> > dead language in a research paper vs. tagging a piece of
> > text in one of the core Internet protocols such as SMTP.
> 
> That count does not include dead languages.  Whether it includes
> dialects is a matter of terminology.

Fine. The point is that the draft provides for language tags
that are so long that they cannot be used with the core
Internet protocols. A tag associated with audio media doesn't
need a means to indicate script or other orthography -- they're
irrelevant for spoken material.  RFC 3066's provision for
registry worked well. Removing that requirement -- as the
draft would do -- necessitates a specific upper bound on
tag length that will work with existing core protocols, to
replace the reviewer, Area Director, and community review
process that ensure that current registered tags work with
those protocols.