ISO 639-5 reconfirmation ballot (long)

Caoimhin O Donnaile caoimhin at smo.uhi.ac.uk
Tue Jul 19 03:36:30 CEST 2016


> There are 26³ = 17576 possible 3-letter codes. The Registry has 8114
> primary language subtags (including deprecated ones, which don't go
> away), plus another 520 private-use allocations. I don't think we at BCP
> 47 should contemplate adding a subgrouping mechanism that requires
> anywhere near 8942 subtags, actually more than the number of languages.

The “composite trees” at Multitree are the best effort I have found of a 
comprehensive system of grouping all world languages past and present. 
These have in total (in the copy I took a couple of years ago), roughly:

   3100 nodes/families/groups in the hierarchies of languages
   7600 languages
    360 dialect groups
  11270 major dialects

  22330 codes in total

As others have said, any system of grouping all world languages into 
genetic groups would be in a constant state of flux as theories of 
language origins are refined.  Nevertheless, most of the facts of 
practical importance are very stable even when the intermediate 
groupings change: Welsh is always going to be a Celtic language, 
regardless of whether the Insular/Continental division is considered to 
be a significant fundamental grouping among the Celtic languages, and 
regardless of any Italo-Celtic hypothesis, and regardless of the status 
of Pictish.

I have a vision of a system where a user could request for example “all 
resources in any Celtic language”.  To be able to provide this, all 
nodes in the hierarchy, all languages, all dialects, would by assigned 
atomic, semi-mnemonic, non-transferable four-letter codes.  A real-time 
Internet system would be able to examine the hierarchy and answer 
queries such as whether “cymr” is included in “celt”.  Deprecated codes 
would be maintained in the system along with a best effort at showing 
their inclusion relationships to nearby non-deprecated codes.  A 
fine-structured hierarchy with lots of nodes would thus actually aid in 
providing stability as details change.  Applications could cache a full 
or partial copy of the hierarchy, but the real-time online system would 
be available to applications which either wanted the most up-to-date 
information or which did not want use storage space to cache a full 
copy.  By having nodes and languages and dialects all in the same 
namespace, the heat would be taken out of political arguments about what 
is a language and what is not, and the instabilities in the present 
system caused by reclassifications such as et/ekk/vro would be avoided.

But maybe things would not work out so simple as I naively imagine! 
And the four-letter space is unfortunately already in use by scripts and 
therefore is not available.

Caoimhín


More information about the Ietf-languages mailing list