LANGUAGE TAG REGISTRATION FORM: mn-Mong-CN

Sat Feb 12 10:39:24 CET 2005

> From: Michael Everson [mailto:everson at evertype.com]

> >  > On the basis of established precedent, I don't see how the
> documentation
> >>  provided can be considered insufficient.
> >
> >I think this is a variant of the existing complaint: the references have
> to
> >document the use of Mongolian *in China using Mongolian script*.
> 
> The banknotes do that.
> 
> But it does not differ from Mongolian writing in Mongolia.

It is a *serious* mistake to go from a statement like "We do not know of any difference between Mongolian used in Mongolia versus Mongolia used in China" to the conclusion "there is no need for country IDs CN and MN in language tags for Mongolian". I will repeat what I wrote on Feb 3:

"The important thing for us is not to establish precisely what every distinction is (an endless task involving an ever-changing domain over which different interpretations are possible), but rather to ensure that the intended meaning of any tag is understood by all and for which it is clear, to some minimal level, how to utilize it."

To elaborate on *why* that is the case, language tags need to distinguish not just differences between those abstract entities out in the real world that we call "languages", but rather they need to distinguish **uses** of language in content and all kinds of digital language resources. 

This is the whole point behind the debate we had over "es-americas": the point wasn't whether there was an identifiable dialect corresponding to that tag; rather, the point was that there are scenarios in which language resources have a linguistic property that *that* tag reflects and that need to be distinguished from other language resources.

In the Mongolian case, we cannot dictate that nobody should ever have, say, a terminology database in which Mongolian terms used in China are distinct from Mongolian terms for the same concepts used in Mongolia. In other words, we may not know of any linguistic difference between "mn-Mong-MN" and "mn-Mong-CN", but we absolutely cannot assert that there can never be a need for someone to distinguish language resources using "mn-Mong-MN" and "mn-Mong-CN".

To take another example, from a descriptive-linguistic perspective, I wouldn't expect there to be any difference between fr-CI and fr-GH. But if (e.g.) in some obscure commercial domain there is some difference between a term used for a concept in Côte d'Ivoire and Ghana, then there is a legitimate reason for the use of fr-CI and fr-GH in tagging content, and we absolutely cannot assert that such situations cannot exist.

I fully appreciate the concerns of linguistic purity, being a linguist myself, but we are not doing descriptive linguistics here, we are doing IT implementation; and over-zealous application of descriptive-linguistic ideals can lead us into incorrect thinking. Language tags are not intended to be documentation of human knowledge of languages; they are intended as metadata elements for distinguishing linguistic properties of language resources and general linguistic content.

Peter Constable