ISO 639-5 reconfirmation ballot (long)

Anthony Aristar anthony at
Mon Jul 18 22:51:13 CEST 2016

Well, there are two reasons why using alpha-3 codes is a very bad idea.  
First, the number of codes we would need in order to code any possible 
subgrouping hypothesis is very large.  I question whether the alpha-3 
code-space is large enough for both languages and subgroups.  Second, 
we're missing the human factor here.  ISO 639-3 is used extensively by 
linguists now.  A number of them have personally said to me that they 
don't want to have to look up a code to know whether it refers to a 
subgroup hypothesis or a language.

As for a replacement for ISO 639-5... well, I could suggest you look at 
the Multitree system, though I hesitate to suggest it since I was 
responsible for it.  Perfect it is not, but it is eminently usable, for 
the system is very extensive indeed, and since codes are defined by 
trees which are as complete as possible, you can always tell what 
languages would fall under them.

On 7/18/2016 3:26 PM, Peter Constable wrote:
> The lack of documentation about relationships between "collections" and individual languages is a problem that Gary Simons and I called out about 15 years ago. The problem was there in 639-2 before 639-5 was even conceived. And looking at the MARC Language Code List, which was the key source for 639-2, doesn't help a whole lot. From a dialectology perspective, it's all a bit of a mess; but if the librarians find it useful, then it's a useful mess. (Note that the librarians did not ask for and, AFAIK, do not use the additional collections added in 639-5.)
> So, IMO, it is what is it, just leave it be, and don't make any great (or even modest) expectations of it. If someone decides we really need a more comprehensive coding system for collections, then perhaps an extension using a new system not constrained to alpha-3 IDs in the 639 ID space would be the best approach.
> Peter
> -----Original Message-----
> From: Ietf-languages [mailto:ietf-languages-bounces at] On Behalf Of Doug Ewell
> Sent: Saturday, July 16, 2016 8:20 PM
> To: ietf-languages <ietf-languages at>; Anthony Aristar <anthony at>
> Subject: Re: ISO 639-5 reconfirmation ballot (long)
> To the extent that Anthony is arguing that ISO 639-5 language collections can't be correlated with individual languages, I certainly agree that that is a problem.
> To pick one example, ISO 639-5 provides the following hierarchy for [cmc], "Chamic languages":
> map : poz : pqw : cmc
> This denotes the following relationship:
> [map] Austronesian languages
>      +-- [poz] Malayo-Polynesian languages
>          +-- [pqw] Western Malayo-Polynesian languages
>              +-- [cmc] Chamic languages
> But there's no way to look up what individual languages are contained within [cmc]. For that matter, we can't tell except by exhaustive scanning whether [cmc] contains other, lower-level collections.
> I don't know if this can realistically be solved; see my earlier comment about Ethnologue attempting to keep track of their own hierarchy, and changing the relationships with some frequency. Still, I can see that it limits the usefulness of the collections. Going back to my example of tagging something as "Hmong-Mien languages," that might not help if there is no common agreement on the members of the set of Hmong-Mien languages.
> I'm not quite as sympathetic to why it is such a problem that collection codes cannot be easily distinguished at sight from individual language codes. I'm sure I'm missing something obvious here.
> --
> Doug Ewell | Thornton, CO, US |
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at

More information about the Ietf-languages mailing list