<div dir="ltr"><div class="gmail_default"><font face="times new roman, serif">First off, I agree with all of what Doug has said. For example, it is not a barrier that current best practice of the language "tree" is imperfect and may change; we already have a mechanism for accommodating change, because we need it with language codes themselves. The real missing piece is a mapping from codes to child codes.</font></div><div class="gmail_default"><font face="times new roman, serif"><br></font></div><div class="gmail_default"><font face="times new roman, serif">As to the figures:</font></div><div class="gmail_default"><font face="times new roman, serif"><br></font></div><div class="gmail_default"><font face="times new roman, serif">> <span style="font-size:11.2px">  3100 nodes/families/groups in the hierarchies of languages</span></font></div><font face="times new roman, serif">> <span style="font-size:11.2px">  </span><span style="font-size:11.2px">7600 languages</span><br style="font-size:11.2px">> <span style="font-size:11.2px">  <div class="gmail_default" style="display:inline"> </div></span><span style="font-size:11.2px">360 dialect groups</span><br style="font-size:11.2px">> <span style="font-size:11.2px">11270 major dialects<div class="gmail_default" style="display:inline"></div></span></font><div><span style="font-size:11.2px"><div class="gmail_default" style="display:inline"><font face="times new roman, serif"><br></font></div></span></div><div><font face="times new roman, serif"><span style="font-size:11.2px"><div class="gmail_default" style="display:inline">These figures wouldn't be a barrier. 639-5 should only contain elements of the first set, which can be encompassed in the current 3-letter space. Dialects and dialect groups would sit below the languages. In </div></span><span style="font-size:11.2px">BCP47<div class="gmail_default" style="display:inline"> they</div></span><span style="font-size:11.2px"> would be represent<div class="gmail_default" style="display:inline">ed</div> with language + variant, where variant can be 5-8 letters<div class="gmail_default" style="display:inline">, allowing plenty of space.</div></span></font></div><div><span style="font-size:11.2px"><div class="gmail_default" style="display:inline"><font face="times new roman, serif"><br></font></div></span></div><div><font face="times new roman, serif"><span style="font-size:11.2px"><div class="gmail_default" style="display:inline">There could be a new 639-X that contained both dialects and languages </div></span><span style="font-size:11.2px">(and their groups)<div class="gmail_default" style="display:inline"> </div></span><span style="font-size:11.2px">in <div class="gmail_default" style="display:inline">an 8</div> letter namespace<div class="gmail_default" style="display:inline">. All that would be needed would be a mapping from X to 639-3/5.</div><div class="gmail_default" style="display:inline"> Ideally this mapping would be as algorithmic as possible.</div></span></font></div><div><font face="times new roman, serif"><br></font></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><font face="'times new roman', serif"><div style="background-color:transparent;margin-top:0px;margin-left:0px;margin-bottom:0px;margin-right:0px"><div></div></div><div style="background-color:transparent;margin-top:0px;margin-left:0px;margin-bottom:0px;margin-right:0px">Mark</div></font><div><div><font face="'times new roman', serif"><i><span style="font-style:normal"><i></i></span><i></i></i></font></div></div></div></div></div></div></div>

<br><div class="gmail_quote">On Tue, Jul 19, 2016 at 3:36 AM, Caoimhin O Donnaile <span dir="ltr"><<a href="mailto:caoimhin@smo.uhi.ac.uk" target="_blank">caoimhin@smo.uhi.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

There are 26³ = 17576 possible 3-letter codes. The Registry has 8114<br>

primary language subtags (including deprecated ones, which don't go<br>

away), plus another 520 private-use allocations. I don't think we at BCP<br>

47 should contemplate adding a subgrouping mechanism that requires<br>

anywhere near 8942 subtags, actually more than the number of languages.<br>

</blockquote>

<br></span>

The “composite trees” at Multitree are the best effort I have found of a comprehensive system of grouping all world languages past and present. These have in total (in the copy I took a couple of years ago), roughly:<br>

<br>

  3100 nodes/families/groups in the hierarchies of languages<br>

  7600 languages<br>

   360 dialect groups<br>

 11270 major dialects<br>

<br>

 22330 codes in total<br>

<br>

As others have said, any system of grouping all world languages into genetic groups would be in a constant state of flux as theories of language origins are refined.  Nevertheless, most of the facts of practical importance are very stable even when the intermediate groupings change: Welsh is always going to be a Celtic language, regardless of whether the Insular/Continental division is considered to be a significant fundamental grouping among the Celtic languages, and regardless of any Italo-Celtic hypothesis, and regardless of the status of Pictish.<br>

<br>

I have a vision of a system where a user could request for example “all resources in any Celtic language”.  To be able to provide this, all nodes in the hierarchy, all languages, all dialects, would by assigned atomic, semi-mnemonic, non-transferable four-letter codes.  A real-time Internet system would be able to examine the hierarchy and answer queries such as whether “cymr” is included in “celt”.  Deprecated codes would be maintained in the system along with a best effort at showing their inclusion relationships to nearby non-deprecated codes.  A fine-structured hierarchy with lots of nodes would thus actually aid in providing stability as details change.  Applications could cache a full or partial copy of the hierarchy, but the real-time online system would be available to applications which either wanted the most up-to-date information or which did not want use storage space to cache a full copy.  By having nodes and languages and dialects all in the same namespace, the heat would be taken out of political arguments about what is a language and what is not, and the instabilities in the present system caused by reclassifications such as et/ekk/vro would be avoided.<br>

<br>

But maybe things would not work out so simple as I naively imagine! And the four-letter space is unfortunately already in use by scripts and therefore is not available.<br>

<br>

Caoimhín<br>_______________________________________________<br>

Ietf-languages mailing list<br>

<a href="mailto:Ietf-languages@alvestrand.no">Ietf-languages@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/ietf-languages" rel="noreferrer" target="_blank">http://www.alvestrand.no/mailman/listinfo/ietf-languages</a><br>

<br></blockquote></div><br></div>