ON LANGUAGE NAMES /// RE: Results of Duplicate Busters Survey #2

Lang Gérard gerard.lang at insee.fr
Fri Sep 5 17:38:41 CEST 2008

1-It is not true that no one else is caring much on these questions, because I am very much caring about the question of language names.
2-Let me remind you that ISO 639 generic title is "Codes for the representation of language names", so that alpha-2 and alpha-3 ISO 639 code elements  are definitely not directly coding a representation of languages, but merely coding a representation of language names, that is not exactly the same !
3-In fact, ISO 639 (1988) "Code for the representation of names of languages" was very explicit on this point and provided tables giving, for every of its 136 entries:
-an alpha-2 code element;
-a UNIQUE english language name;
-a UNIQUE french language name;
-a UNIQUE original name (that is a [romanized form, if necessary] script form of the autonym, that is the name of the considered language in this language).
And this original name of the language (so,clearly, a representation of the name of the language) was the basic information to be coded by ISO 639, as stated by clause 4.1 "Form of the language (name) symbols" that writes:
"The language (name) symbols are created from the original language name, if in Latin spelling, or from the language name converted in Latin characters. For sure, one could argue that the french or the english translation of the original name of a language could be described as a conversion of this original name in Latin characters; but this was clearly not the intended interpretation, that was about romanization of the original name (ther was no question about languages having only spoken form and no written form, because the "scope and field of application" was terminology and lexicography and all 136 language names  coded were about written languages).

4-ISO 639-2 (1998) "Codes for the representation of names of languages-Part 2: Alpha-3 code" was by far not so explicit. Tables gave, for every of its 430 enteries:
-an alpha-3 B(ibliographic) code element;
-an alpha-3 T(erminologic) code element;
-an UNIQUE english language name;
-an UNIQUe french language name.

Alas, no original language name was given in the tables, and clause 4.1 "Form of language (name) codes" is not very explicit, exploring many possibilities:
-preference of the countries using the language;
-established usage of codes in national and international bibliographic databases;
-vernacular form of the (namer of the) language;
-english (but not french) form of  the (name of the) language.

This situation generated much confusion, and this confusion has been considerably worsened by the fact that, after the publication of ISO 639-2, ISO 639 RA/JAC began to adopt many possible variants (generally in english language) for the name of the ISO 639-2 entries.

5-ISO 639-1 (2002) "Codes for the representation of names of languages-Part 1: Alpha-2 code", that replaced ISO 639 (1988), was much better in this insight.
42 more entries were added to the136 previous ones, and tables gave:
-an alpha-2 code element (with 3 changes and 1 correction from ISO 639 (1988));
-a (most of the time, but alas not systematically) UNIQUE english language name;
-a (most of the time, but alas not systematically) UNIQUE french language name;
-a (most of the time, but not systematically) UNIQUE indigenous language name.

Clearly, an "indigenous language name" should be the same that an "original language name", and should also be UNIQUE. In fact, there are variants of the indigenous name in 9 cases, no one being specially convincing of the need of such variants !
Specially because clause 4.1 "Form of the language (name) identifier" writes:
"The language (name) identifiers are derived from the language name. Each identifier is based on THE indigenous name of the language, or the preference of the communities using the language"

6-ISO 639-3 (2007) "Codes for the representation of the name of languages-Part 3:Alpha-3 code for a comprehensive coverage of languages" is much more like ISO 639-2 fuzzy considerations.
Clause 4.1 "Form of the language (name) identifier" writes:
"Language (nazme) identifiers are not intended to be an abbreviation for the name of the language, but to serve as a device to identify a given language uniquely. With thousands of languages, many pairs of which have similar names, it is not possible to provide identifiers that resemble a language name in every case. In many cases, language idenrtifiers do bear some resemblance to a name for a language, but this is not guaranteed. Many languages have alternate names used by different internal or external communities. In such cases, the form of the language identifier does not imply that a name resembling the language identifier is considered to be preferred"
After such considerations, it would seem urgent to search for a maximum of "original language names" or "indigenous language names"( with a romanization if the script form ids not written with a variant of the Latin alphabet, or a phonetisation by the International Phonetic Alphabet (IPA) when the considered language has only spoken form and no written form), so as to assure a maximum of security for the identification, real existence and uniqueness of the considered language (name) and a good capacity of choice for the alpha-3 code element to be attributed to this ISO 639-3 entry.
But the only indication for this insideISO 639-3 is the definition 3.4 "name; reference name; appellation", that writes "linguistic expression used to designate an individual concept" that gives not much help for us, even if 5 notes give some more information on how to cope with this..

Clause 5 "Language code tables" writes:
"The language code for this part of ISO 639 consists of the following tables of information:
-Table of language code elements;
-Table of mapping of macrolanguage code elements to individual language code elements.
These tables are published by the Registration Authority for ISO 6393 and are available online.."
So, in fact, ISO 639-3 directly publishes no list, and gives no description of the content of the electronic tables available at ISO 639-3/RA.

These tables are very well done, but are only giving:
-an ISO 639-1 alpha-2 code element, when existing;
-an ISO 639-2 alpha-3T code element, when existing;
-an ISO 639-3 code element;
-a (NOT QUALIFIED: Reference name ?) UNIQUE language name;
-a scope (Individual/Macrolanguage);
-a type (Living/Extinct/Historic);
-a link with Ethnologue..

These informations are interesting, but others basic ones such as the original, or indigenous name for ISO 639-1 entries and the autonyms that are now being collected by ISO 639-3/RA are not available.

So, the question of the choice of the "language name", of the "representation of the language name" and of the "coding of  the representation of the language name"
that are the mediation between the "language name" and its "code element(s)" inside ISO 639, and between the considered "language" and its "language tag(s)" is still to be elaborated. So that I am very, very much caring about the question of "language names".

Bien cordialement.
Gérard LANG


-----Message d'origine-----
De : ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] De la part de Michael Everson
Envoyé : vendredi 5 septembre 2008 15:30
À : ietflang IETF Languages Discussion
Objet : Re: Results of Duplicate Busters Survey #2

Doug has asked me to rule on his survey.

I am happy with the 639-3 name only. However, I can accept that the
639-2 name could be added alongside if Frank Ellerman insists (since no one else seems to care much).

Frank, please state your preference, so we can be done with this.

Michael Everson * http://www.evertype.com

Ietf-languages mailing list
Ietf-languages at alvestrand.no

More information about the Ietf-languages mailing list