ISO 639-5 reconfirmation ballot (long)

Doug Ewell doug at
Tue Jul 19 05:14:11 CEST 2016

Caoimhín O Donnaile wrote:

> The “composite trees” at Multitree are the best effort I have found of
> a comprehensive system of grouping all world languages past and
> present. These have in total (in the copy I took a couple of years
> ago), roughly:
>    3100 nodes/families/groups in the hierarchies of languages
>    7600 languages
>     360 dialect groups
>  11270 major dialects
>   22330 codes in total

I didn't know we were talking about dialects -- extending the tree 
downward as well as upward. I thought we were talking about families, 
groups, hierarchies of languages. Yes, of course if you include dialects 
you would need quite a bit more than 9000 additional codes.

> Welsh is always going to be a Celtic language,

Yes, and Italian is always going to be a Romance language. The easy 
cases are always easy to encode. Does the Tsouic branch, which includes 
Kanakanabu and two other languages, belong to the Toraja-Sa'dan branch 
of the Northern branch of the South Sulawesi branch of 
Malayo-Polynesian, or is it a branch directly from Austronesian? 
Ethnologue changed this between April and May 2016.

> I have a vision of a system where a user could request for example
> “all resources in any Celtic language”.  To be able to provide this,
> all nodes in the hierarchy, all languages, all dialects, would by
> assigned atomic, semi-mnemonic, non-transferable four-letter codes.

This sounds exactly like ISO 639-6. Why was that standard withdrawn, 

> But maybe things would not work out so simple as I naively imagine!
> And the four-letter space is unfortunately already in use by scripts
> and therefore is not available.

Only if the intent for BCP 47 is that these four-letter subtags would 
not be primary, but would follow a two- or three-letter primary language 
subtag. Otherwise, well, we already reserved four-letter primary 
language subtags in BCP 47, thinking that 639-6 was going to be a 
success and that we would one day incorporate it.

There is no inherent conflict between four-letter script codes, ICAO 
airport codes, U.S. commercial broadcasting call letters, and 
Myers-Briggs type indicators. They occupy different code spaces. One 
sometimes reads that two-letter language codes and two-letter country 
codes are in conflict, and that it's a bug that the code for Japanese 
(ja) is different from the code for Japan (JP), and that too is a myth.

Doug Ewell | Thornton, CO, US |

More information about the Ietf-languages mailing list