ISO 639-5 reconfirmation ballot (long)

Peter Constable petercon at
Mon Jul 18 22:26:46 CEST 2016

The lack of documentation about relationships between "collections" and individual languages is a problem that Gary Simons and I called out about 15 years ago. The problem was there in 639-2 before 639-5 was even conceived. And looking at the MARC Language Code List, which was the key source for 639-2, doesn't help a whole lot. From a dialectology perspective, it's all a bit of a mess; but if the librarians find it useful, then it's a useful mess. (Note that the librarians did not ask for and, AFAIK, do not use the additional collections added in 639-5.) 

So, IMO, it is what is it, just leave it be, and don't make any great (or even modest) expectations of it. If someone decides we really need a more comprehensive coding system for collections, then perhaps an extension using a new system not constrained to alpha-3 IDs in the 639 ID space would be the best approach.


-----Original Message-----
From: Ietf-languages [mailto:ietf-languages-bounces at] On Behalf Of Doug Ewell
Sent: Saturday, July 16, 2016 8:20 PM
To: ietf-languages <ietf-languages at>; Anthony Aristar <anthony at>
Subject: Re: ISO 639-5 reconfirmation ballot (long)

To the extent that Anthony is arguing that ISO 639-5 language collections can't be correlated with individual languages, I certainly agree that that is a problem.

To pick one example, ISO 639-5 provides the following hierarchy for [cmc], "Chamic languages":

map : poz : pqw : cmc

This denotes the following relationship:

[map] Austronesian languages
    +-- [poz] Malayo-Polynesian languages
        +-- [pqw] Western Malayo-Polynesian languages
            +-- [cmc] Chamic languages

But there's no way to look up what individual languages are contained within [cmc]. For that matter, we can't tell except by exhaustive scanning whether [cmc] contains other, lower-level collections.

I don't know if this can realistically be solved; see my earlier comment about Ethnologue attempting to keep track of their own hierarchy, and changing the relationships with some frequency. Still, I can see that it limits the usefulness of the collections. Going back to my example of tagging something as "Hmong-Mien languages," that might not help if there is no common agreement on the members of the set of Hmong-Mien languages.

I'm not quite as sympathetic to why it is such a problem that collection codes cannot be easily distinguished at sight from individual language codes. I'm sure I'm missing something obvious here.

Doug Ewell | Thornton, CO, US |

Ietf-languages mailing list
Ietf-languages at

More information about the Ietf-languages mailing list