Addition request: alsatian

Fri Jan 11 03:30:55 CET 2008

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Michael Everson

> >We are tagging languages, not names of languages.
>
> We are an extension of ISO 639, which gives codes for the
> representation of names of languages.

There is one person I know of who has tried to make much of this point to me on a few occasions, but I think we must be clear about this: saying that ISO 639 codes names leads to non-productive ends.

Strictly speaking, (e.g.) "French" and "français" are distinct names. Clearly nobody has ever intended that distinct names drawn from different source languages should be coded differently (practice in ISO 639 with the English and French names for each ID makes that clear).

Now, (e.g.) "Persian" and "Farsi" can be considered to be two English-language names for the same semantic category, and they are quite plainly distinct. Was ISO 639 intended to code those distinctly? I cannot imagine what that would ever have been considered the intent as that would serve no useful purpose.

But, the above example is relevant to what I think might have been the understanding of some or all of the original creators of ISO 639: that the coding of ISO 639 is an operation on the string for a name of a given language to derive a 2- or 3-letter string. Thus, "eng" is a coding of "english", with the implication that (for instance) "xyz" was not a possible candidate coding for a name for the English language. How this pertains to the previous example is that the coding must choose one or the other name as the base: "persian" or "farsi"; thus they are distinct names, and (in this sense) only one of them is coded.

The end result is that we have IDs that code a language name, and as a result that reference a particular language. In theory, both "persian" and "farsi" could be coded, and in this particular case they were both coded in ISO 639-2 -- but that is one of 22 cases that are and have always been considered exceptional: the 22 differences between ISO 639-2/T and ISO 639-2/B exist for the same kind of reason that Unicode has precomposed and compatibility characters: an expedient compromise in the face of legacy or similar demands to get the consensus needed to adopt the standard. Those are the only cases of knowing creation of synonymous IDs in ISO 639 (i.e. within a single codespace), and I can assure you that nobody on the current JAC would ever dream of creating a new ID for a language name that would result in *language* synonymy.

In other words, in a very real sense, we can consider ISO 639 as, for practical purposes, coding languages, not specific language names. And even if the original intent was "coding a name" was understood as deriving a string from the string of a conventional name, that clearly is no longer feasible with ISO 639-3. So, while the name of the standards remain "codes for the representation of the names of languages", IMO that is really an anachronism, and that the practical result of having an association between an ID and some language itself (not a name) that is the current practice of the JAC is consistent with the practical results of coding in general (barring the 22 639-2 exceptions) that have been part of ISO 639 all along.

Peter