ISO 639-5 reconfirmation ballot (long)

Mark Davis ☕️ mark at macchiato.com
Tue Jul 19 04:55:55 CEST 2016


First off, I agree with all of what Doug has said. For example, it is not a
barrier that current best practice of the language "tree" is imperfect and
may change; we already have a mechanism for accommodating change, because
we need it with language codes themselves. The real missing piece is a
mapping from codes to child codes.

As to the figures:

>   3100 nodes/families/groups in the hierarchies of languages
>   7600 languages
>
​ ​
360 dialect groups
> 11270 major dialects
​

These figures wouldn't be a barrier​. 639-5 should only contain elements of
the first set, which can be encompassed in the current 3-letter space.
Dialects and dialect groups would sit below the languages. In
BCP47
​ they​
 would be represent
​ed
 with language + variant, where variant can be 5-8 letters
​, allowing plenty of space.​

There could be a new 639-X that contained both dialects and languages
(and their groups)
​ ​
in
​an 8
 letter namespace
​. All that would be needed would be a mapping from ​X to 639-3/5.
 Ideally this mapping would be as algorithmic as possible.


Mark

On Tue, Jul 19, 2016 at 3:36 AM, Caoimhin O Donnaile <caoimhin at smo.uhi.ac.uk
> wrote:

> There are 26³ = 17576 possible 3-letter codes. The Registry has 8114
>> primary language subtags (including deprecated ones, which don't go
>> away), plus another 520 private-use allocations. I don't think we at BCP
>> 47 should contemplate adding a subgrouping mechanism that requires
>> anywhere near 8942 subtags, actually more than the number of languages.
>>
>
> The “composite trees” at Multitree are the best effort I have found of a
> comprehensive system of grouping all world languages past and present.
> These have in total (in the copy I took a couple of years ago), roughly:
>
>   3100 nodes/families/groups in the hierarchies of languages
>   7600 languages
>    360 dialect groups
>  11270 major dialects
>
>  22330 codes in total
>
> As others have said, any system of grouping all world languages into
> genetic groups would be in a constant state of flux as theories of language
> origins are refined.  Nevertheless, most of the facts of practical
> importance are very stable even when the intermediate groupings change:
> Welsh is always going to be a Celtic language, regardless of whether the
> Insular/Continental division is considered to be a significant fundamental
> grouping among the Celtic languages, and regardless of any Italo-Celtic
> hypothesis, and regardless of the status of Pictish.
>
> I have a vision of a system where a user could request for example “all
> resources in any Celtic language”.  To be able to provide this, all nodes
> in the hierarchy, all languages, all dialects, would by assigned atomic,
> semi-mnemonic, non-transferable four-letter codes.  A real-time Internet
> system would be able to examine the hierarchy and answer queries such as
> whether “cymr” is included in “celt”.  Deprecated codes would be maintained
> in the system along with a best effort at showing their inclusion
> relationships to nearby non-deprecated codes.  A fine-structured hierarchy
> with lots of nodes would thus actually aid in providing stability as
> details change.  Applications could cache a full or partial copy of the
> hierarchy, but the real-time online system would be available to
> applications which either wanted the most up-to-date information or which
> did not want use storage space to cache a full copy.  By having nodes and
> languages and dialects all in the same namespace, the heat would be taken
> out of political arguments about what is a language and what is not, and
> the instabilities in the present system caused by reclassifications such as
> et/ekk/vro would be avoided.
>
> But maybe things would not work out so simple as I naively imagine! And
> the four-letter space is unfortunately already in use by scripts and
> therefore is not available.
>
> Caoimhín
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20160719/d0ed3255/attachment.html>


More information about the Ietf-languages mailing list