New Last Call: 'Tags for Identifying Languages' to BCP

Mark Crispin mrc at CAC.Washington.EDU
Mon Dec 13 01:20:43 CET 2004

On Sun, 12 Dec 2004, Bruce Lilly wrote:
> If by international agreement, 'yz' becomes the designation
> for that country, then it is rather silly to stick one's
> fingers in one's ears and shout "NA-NA-NA-NA-NA I don't want
> to hear you".

What is silly is saying that every language tag has to have a date/time 
attribute associated with it so that computer software managing that text 
knows the language of that text.

But that is precisely what you are advocating.

> It's rather silly to change that correspondence simply because
> a few people are piqued that international agreement has been
> reached to change a few 2-letter codes.

It's bad enough that TLDs get recycled.

It is a disaster for language identifiers to get recycled.  Something has 
to make those identifiers unique.  Your notion will force the inclusion of 
a date/time stamp in language tags, to restore the uniqueness that you are 
so excruciatingly eager to abolish.

> Never
> mind the shortcomings of that particular example; consider
> "de-DE" -- does that mean Germany as it exists today, West
> Germany as it existed 25 years ago, Germany as it existed
> in the 1930s, the 1900s, ...?

For the 98% case, it does not matter at all.

But it does matter if, one day, "DE" becomes Denmark.

> As far as I can tell, the draft pretends that the meaning
> of "CS" hasn't changed, and would in fact change the meaning
> of the currently valid RFC 3066 language tag "sr-CS".

No, it restores the previous meaning of sr-CS.

> It is very different; under the proposed draft, there is only
> an English definition, somebody wishing to provide a French
> definition finds that he has none and must resort to an
> unofficial translation.

Why is the situation for French different from someobody wishing to 
provide a Lower Slobbobian definition?

> SO where are the French definitions?

Ask a person who is bilingual in English and French to provide one.

> Well, sure. But the name is an important thing by itself.
> It is rather pointless to ask a user to indicate the
> language of a piece of text by selecting from a list "AB, ACE,
> ACH,..., ZHA, ZUL, ZUN" -- the user doesn't normally refer to
> languages by codes. It's quite a different matter to ask the
> user to select from "Abkhaze, Aceh, Acoli,..., Zhuang (Chuang),
> Zoulou, Zuni".

Abkhaze, Aceh, Acoli,..., Zhuang (Chuang), Zoulou, and Zuni are not 
language tags.  So what's your point?

>> Note that the RFC 3066 specifies a registry that does not include French
>> language names. I suggest that this issue should be dropped.
> Yes, the current IANA registry has that problem for
> the non-ISO-based tags only. If the registry is to be
> changed to subsume ISO codes as well, that defect should
> be remedied.

Why is it a problem?  Why is it a defect?

> On the contrary, it is preposterous to suggest that codes
> will be attached to text by magic

Here is where you are misled.  Many of these tags are embedded within the 
text itself.  That text may long outlive its author in an archive.

> My concern
> is the elimination of the French definition in the first place.

Why is this a problem?

-- Mark --
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

More information about the Ietf-languages mailing list