ISO 639-5 reconfirmation ballot (long)
Caoimhin O Donnaile
caoimhin at smo.uhi.ac.uk
Tue Jul 19 03:36:30 CEST 2016
> There are 26³ = 17576 possible 3-letter codes. The Registry has 8114
> primary language subtags (including deprecated ones, which don't go
> away), plus another 520 private-use allocations. I don't think we at BCP
> 47 should contemplate adding a subgrouping mechanism that requires
> anywhere near 8942 subtags, actually more than the number of languages.
The “composite trees” at Multitree are the best effort I have found of a
comprehensive system of grouping all world languages past and present.
These have in total (in the copy I took a couple of years ago), roughly:
3100 nodes/families/groups in the hierarchies of languages
7600 languages
360 dialect groups
11270 major dialects
22330 codes in total
As others have said, any system of grouping all world languages into
genetic groups would be in a constant state of flux as theories of
language origins are refined. Nevertheless, most of the facts of
practical importance are very stable even when the intermediate
groupings change: Welsh is always going to be a Celtic language,
regardless of whether the Insular/Continental division is considered to
be a significant fundamental grouping among the Celtic languages, and
regardless of any Italo-Celtic hypothesis, and regardless of the status
of Pictish.
I have a vision of a system where a user could request for example “all
resources in any Celtic language”. To be able to provide this, all
nodes in the hierarchy, all languages, all dialects, would by assigned
atomic, semi-mnemonic, non-transferable four-letter codes. A real-time
Internet system would be able to examine the hierarchy and answer
queries such as whether “cymr” is included in “celt”. Deprecated codes
would be maintained in the system along with a best effort at showing
their inclusion relationships to nearby non-deprecated codes. A
fine-structured hierarchy with lots of nodes would thus actually aid in
providing stability as details change. Applications could cache a full
or partial copy of the hierarchy, but the real-time online system would
be available to applications which either wanted the most up-to-date
information or which did not want use storage space to cache a full
copy. By having nodes and languages and dialects all in the same
namespace, the heat would be taken out of political arguments about what
is a language and what is not, and the instabilities in the present
system caused by reclassifications such as et/ekk/vro would be avoided.
But maybe things would not work out so simple as I naively imagine!
And the four-letter space is unfortunately already in use by scripts and
therefore is not available.
Caoimhín
More information about the Ietf-languages
mailing list