Stop requiring endonyms (Was: RFC 4645bis: making 'pes' and'prs'extlangs/// Better use autonyms

Fri Dec 12 12:01:22 CET 2008

Dear Addison Phillips, 
Thank you very much for answering my message.
Let me add some comments.

1-Surely, RFC 4646 is an IETF BCP about identifyinglanguages with language tags. Certainly, ISO 639 is an ISO standard about codes for the representation of names of languages. Evidently, there must be a link between both of them.
And so, it is a perfectly good solution that RFC 4646, as an IETF "de facto" standard, uses ISO 639, as an ISO "de jure" standard, for reference.
It is also plainly understandable that the needs for RFC 4646 could eventually have some retro-influence about the maintenance of ISO 639, but certainly not to the point of contradicting the spirit, the history or even the precise letter of the text of ISO 639, as it certainly is now the case in my opinion. For example, the fact that ISO 639/RA-JAC offered a statement about "freezing" ISO 639-1 in the interest of RFC 4646 (that is dirtectly in contradiction with the letter of the text of this standard) is an example of wjhat should certainly not be accepted. As is the fact that ISO 639-3 is not a bilingually english/french face-à-face presented standard (as are ISO 639, ISO 639-1, ISO 639-2 and ISO 639-5), that is a grave nuisance for comprehension, when the english text is very ambiguously written about the foindamental question of what ido ISO 639-3 code elements represent (language names, as says the general title of ISO 639, or directly languages, that shouls not be the case), so that "manipulations" on "reference names" are occuring too frequently.

2-And, in fact, let me insist that "reference name"is not a general concept used by ISO 639, as (the letter of) your message seems to think.
ISO 639 uses "original name" as major basis for uits coding-representation scheme.
ISO 639-1 uses "indigenous name" for its coding-representation scheme.
ISO 639-2 uses "established usages by bibliographic databases" as major basis for its "bibliographic coding-representation scheme (?)" and "vernacular form of the language" as major basis for its "terminologic coding-representation"scheme.
Only ISO 639-3 uses "reference name" (defined in 3.4 as "initially established by Ethnologue") as (not systematically if you carefully read clause 4.1, but certainly the letter of this is not to be taken "cum grano salis" !) major  basis for its coding-representation scheme.
I am obliged to remark that, considering that "Ethnologue" is a clear reference to ethnology and what is ethnology and the evident link between linguistics and ethnology
(as clearly evidenced by David Dalby in the formulation "Speech communities"), it is more than astonishing tthat thje reference names provided by Ethnologue are almost 
Always only english version (translation ?) of the name of the considered language and mnot derived from the genuine autonyms.

3-Considering specifically the ISO 639-3 code element "fas", whose "supposed" reference name (because the ISO 639-3/RA, whose table has a column whose title is only "language name", never answered  when I asked to know if this "Language name" was effectively systematically identical with the corresponding "reference name") is "Persian", there is no visible link betwween the chain "Persian" and the chain "fas" that "codes the representation of the ["reference"] name".
So,the choice of such a "reference name" is rather curious, when "fas" inside ISO 639-3 is identifying the same language name as "fa" inside 639-1, whose "indigenous name" is "fârsy", that gives an evident visible link between this indigenous name and the code element that "codes the representation of the language ["indigenous"] name."

4-This is (one of the reason) why ISO 639-4, that will be giving the general methodology to build ISO 639 code elements should be precise about these questions.

Bien cordialement.
Gérard LANG

-----Message d'origine-----
De : Phillips, Addison [mailto:addison at amazon.com] 
Envoyé : mercredi 10 décembre 2008 17:27
À : Lang Gérard; John Cowan; Stephane Bortzmeyer
Cc : ietf-languages at iana.org
Objet : RE: Stop requiring endonyms (Was: RFC 4645bis: making 'pes' and'prs'extlangs/// Better use autonyms

Please note:

1. RFC 4646 incorporates names defined by ISO 639 automatically. We don't make the initial descriptions up ourselves.

2. RFC 4646 also provides a registration mechanism whereby *any* description (including non-Latin script descriptions, autonyms, endonyms, anagrams, and gibberish) may be registered.

3. RFC 4646's proposed successor adds some additional requirements. It ensures that at least the ISO 639 Reference Name appears in the record. In the case of 'fa', if the reference name were 'Farsi' (it is not), then that name would appear and appear first. Note that this document says the following (which is very similar to text in RFC 4646):

---
The 'Description' field is used for identification purposes. Descriptions SHOULD contain all and only that information necessary to distinguish one subtag from others that it might be confused with. They are not intended to provide general background information, nor to provide all possible alternate names or designations. 'Description' fields don't necessarily represent the actual native name of the item in the record, nor are any of the descriptions guaranteed to be in any particular language (such as English or French, for example).
---

4. The important point is that people make registration requests when they want some change (addition, modification) or comment on specific proposed records when those are requested (such as additions of new records or proposed registrations). If you feel that changes should be made wholesale to the incoming ISO 639-3 records, your comments would be best directed at ltru@ and draft-ietf-ltru-4645bis. You should also probably comment on draft-ietf-ltru-4646bis (but if you are going to make comments, you should make them NOW, as we are beyond Last Call).

What I'm basically saying is: it does no good to whine about this topic. If you want something, you have to register it yourself. But then, that's what the registration process is for: requesting stuff. If you want an automagic mechanism, you need it to be written into the RFC.

Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages- 
> bounces at alvestrand.no] On Behalf Of Lang Gérard
> Sent: Wednesday, December 10, 2008 1:13 AM
> To: John Cowan; Stephane Bortzmeyer
> Cc: ietf-languages at iana.org
> Subject: RE: Stop requiring endonyms (Was: RFC 4645bis: making 'pes' 
> and 'prs'extlangs/// Better use autonyms
> 
> 0-In my opinion, when studying about identification and denomination 
> concerning languages, the better solution is to use 
> "autonyms"(i.e.:the name given to the considered language by this 
> language itself), that certainly are a kind of endonym.
>  And so, I consider that ISO 639-4, that is to give the general 
> methodology to build and choose Latin-alphabetic  chains of characters 
> of fixed length (alpha-2 for ISO 639-2, alpha-3 for ISO
> 639-2,3 and 5), should use these autonyms along the lines of the 
> following points 1 and 2.
> 
> 1-If the considered language is a written language such that no valid 
> script for this  language is written with a variant of the Latin 
> alphabet,  then a romanization should be used to obtain a "written 
> romanized autonym" that represents a very good basis for sure 
> identification and denomination of the language, as well as a source 
> for building a code element for the representation of the name of the 
> considered language, or a tag to represent this language.
> 2-In the case where the considered language is not a written language, 
> we have to use a "spoken autonym" and a phonetization using the 
> International Phonetic Alphabet (IPA) should be done to get a 
> representation, to eventually further simplify by "romanization", as a 
> good basis for identification and source for building a code element 
> for the representation of the name of the considered language, or a 
> tag to represent this language.
> 
> 3-In the specific case of Persian/Farsi:
> *ISO 639 (1988) gives the alpha-2 code element "fa" to represent the 
> language name whose english version is "Persian", the french version 
> is "perse" and the "original version" is "Farsi" that certainly is a 
> romanization of the autonym.
> *ISO 639-1 (2002) gives the alpha-2 code element "fa" to represent the 
> language name whose english versions are "Farsi, Persian", the frencjh 
> versions are "farsi, perse" and the indigenous (unique) version is 
> "fârsy", that is clearly another roma,nization of the same autonym.
> *ISO 639-2 (1998) gives the alpha-3/B code element "per" and the 
> alpha-3/T code element "fas" (recommanded) to represent the language 
> name whose english version is "Persian" and the french version is 
> "perse".
> *ISO 6639-3 (2006) gives the alpha-3 code element "fas", considered as 
> identical to ISO 639-2/T  (and equivelent to the ISO 639-2 "fa"
> and to the ISO 639-2/B "per"), to represent the macro-language name 
> whose english version is "Persian", that includes the language names 
> whose english version is "dari", coded as "prs", and whose english 
> version is "Western farsi", coded as "pes".
>  Moreover, ISO 639-3 is also representing the language names whose 
> english version is "Southwestern Fars", coded as "fay", and whose 
> english version is "Northwestern Fars", coded as "faz".
> 
> Cordialement.
> Gérard LANG
> 
> -----Message d'origine-----
> De : ietf-languages-bounces at alvestrand.no [mailto:ietf-languages- 
> bounces at alvestrand.no] De la part de John Cowan Envoyé : lundi 8 
> décembre 2008 16:51 À : Stephane Bortzmeyer Cc : 
> ietf-languages at iana.org Objet : Re: Stop requiring endonyms (Was: RFC 
> 4645bis: making 'pes'
> and 'prs'extlangs
> 
> Stephane Bortzmeyer scripsit:
> 
> > I regard this trend (requiring endonyms) as a quite stupid one.
> Will
> > the british ask us to always write "London" instead of the exonym
> we
> > use ("Londres")? Will they send troops if we do not comply? If so,
> we
> > will ask the italians to stop calling our capital "Parigi" (the 
> > endonym is Paris).
> 
> Arguably the English name "Paris" is an endonym as well; in Middle 
> English and Old French, the name was unsurprisingly identical, but 
> sound-changes in both English and French have altered the 
> pronunciation of the "a", the "r", the "i" (in English only), and the 
> "s" (in French only), while leaving the orthography unchanged.
> 
> Similarly, I suppose that the many U.S. placenames of French origin 
> are pronounced as French by francophones, even though French is only 
> minimally an endogenous language of the U.S. (parts of northern New 
> England and Louisiana).
> 
> In New York's Chinatown, street signs are bilingual in English and 
> Chinese, but who's to say which is the exonym and which the endonym in 
> that case?
> 
> > Worse, and more on-topic for this list, will the english-speaking 
> > people require that we call their language "english" while we
> always
> > used "anglais"?
> 
> The vast majority of all names are and must be endonyms.  There are 
> exonyms for Warsawa (Warsaw, Varsovie, Warschau), but none for 
> Zelazowa Wola, even though it was the birthplace of Chopin (whose name 
> was itself something of an exonym).
> 
> When we deal with names across scripts, however, as in the Chinese and 
> Indian cases, we are always dealing with exonyms, and then there is no 
> particular advantage to having multiple exonyms, particularly in 
> writing.  International postal addresses may be written in Latin 
> script or the script of the destination (save for the country name, 
> which must appear in the language of the source), and here having more 
> than one way to write "Beijing" is nothing but a nuisance.
> 
> > To me, "persan" (the french word) is an exonym, like "german"
> > ("deutsch") or "mandarin" (don't know how to write the endonym).
> 
> Mandarin has no universal endonym; it is Putonghua 'common language'
> in the People's Republic, Baihua 'official language' in Taiwan.
> 
> --
> John Cowan  cowan at ccil.org    http://ccil.org/~cowan
> No man is an island, entire of itself; every man is a piece of the 
> continent, a part of the main.  If a clod be washed away by the sea, 
> Europe is the less, as well as if a promontory were, as well as if a 
> manor of thy friends or of thine own were: any man's death diminishes 
> me, because I am involved in mankind, and therefore never send to know 
> for whom the bell tolls; it tolls for thee.  -- John Donne 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages