ISO 639-5 reconfirmation ballot (long)
doug at ewellic.org
Tue Jul 19 05:14:11 CEST 2016
Caoimhín O Donnaile wrote:
> The “composite trees” at Multitree are the best effort I have found of
> a comprehensive system of grouping all world languages past and
> present. These have in total (in the copy I took a couple of years
> ago), roughly:
> 3100 nodes/families/groups in the hierarchies of languages
> 7600 languages
> 360 dialect groups
> 11270 major dialects
> 22330 codes in total
I didn't know we were talking about dialects -- extending the tree
downward as well as upward. I thought we were talking about families,
groups, hierarchies of languages. Yes, of course if you include dialects
you would need quite a bit more than 9000 additional codes.
> Welsh is always going to be a Celtic language,
Yes, and Italian is always going to be a Romance language. The easy
cases are always easy to encode. Does the Tsouic branch, which includes
Kanakanabu and two other languages, belong to the Toraja-Sa'dan branch
of the Northern branch of the South Sulawesi branch of
Malayo-Polynesian, or is it a branch directly from Austronesian?
Ethnologue changed this between April and May 2016.
> I have a vision of a system where a user could request for example
> “all resources in any Celtic language”. To be able to provide this,
> all nodes in the hierarchy, all languages, all dialects, would by
> assigned atomic, semi-mnemonic, non-transferable four-letter codes.
This sounds exactly like ISO 639-6. Why was that standard withdrawn,
> But maybe things would not work out so simple as I naively imagine!
> And the four-letter space is unfortunately already in use by scripts
> and therefore is not available.
Only if the intent for BCP 47 is that these four-letter subtags would
not be primary, but would follow a two- or three-letter primary language
subtag. Otherwise, well, we already reserved four-letter primary
language subtags in BCP 47, thinking that 639-6 was going to be a
success and that we would one day incorporate it.
There is no inherent conflict between four-letter script codes, ICAO
airport codes, U.S. commercial broadcasting call letters, and
Myers-Briggs type indicators. They occupy different code spaces. One
sometimes reads that two-letter language codes and two-letter country
codes are in conflict, and that it's a bug that the code for Japanese
(ja) is different from the code for Japan (JP), and that too is a myth.
Doug Ewell | Thornton, CO, US | ewellic.org
More information about the Ietf-languages