ISO 639-5 reconfirmation ballot (long)

Tue Jul 19 17:32:45 CEST 2016

> Actually, ISO 639-6 used existing three-letter codes for languages when 
> possible, and assigned four-letter codes for groupings (moving up the tree) 
> and dialects and other variants (moving down the tree).
>
> So the only difference is that Caoimhín's system would use four-letter codes 
> uniformly. It seems that 639-6 actually would have been a better fit for 
> Anthony's use case of being able to identify an individual language code at 
> sight.

Yes, and I can see Anthony’s point about not wanting to pollute a 
relatively clean system of three-letter codes for languages with a small 
number of almost unused codes for language groupings.  (Although the 
clean system is already slightly ‘polluted’ by codes for 
“macrolanguages”.)

In the “ideal” system I was suggesting, the fact that codes for dialects 
and languages and nodes in the genetic hierarchy would all be 
four-letter and indistinguishable at sight is actually intended as a 
desirable feature, very important for stability - although I recognise 
that this has disadvantages as well as advantages.

It would mean that we would not have had all the anguish about whether 
to assign a code for Elfdalian.  It would mean that we would avoid 
confusing situations such as whether:
   et = est = ekk+vro (as SIL/Ethnologue/ISO seems to think)
   et = ekk (as Google Translate and lots of others seem to think)
and lots of similar situations:
   az/aze/azj/azb
   my/msa/zsm/...
   ar/ara/arb/...
   sq/sqi/aln/als
   sw/swa/swh
   sh...

> This sounds exactly like ISO 639-6. Why was that standard withdrawn,
> then?

I was only ever vaguely aware of ISO 639-6 but I never saw anything 
usable come out of it.  Perhaps it was too dependent on one or two 
individuals who moved on?  Perhaps it was too ambitious for the 
available resources and infrastructure?  The Wikipedia description 
certainly sounds very like what I was suggesting myself, but I never saw 
anything like that emerge, otherwise I might well be using it.  I 
vaguely remember seeing very detailed longer codes in a codespace too 
dense to be extendible, and only available in pdf form.  In 2012 the 
database supporting the standard was still being promised in the coming 
months:
  http://web.archive.org/web/20120314165525/http://www.geolang.com/iso_639-6.php
but it looks as if geolang.com moved on to be a cyber security company.

I think for a genetic grouping system for languages to work, there is a 
need to:

(1) Accept that it has to be a real-time, online, algorithmic system in 
constant flux in response to ongoing specialist linguistic advice.  The 
system would not attempt to enumerate a list of all Celtic languages for 
example.  The structure would actually be very very simple with each 
code just pointing to its parent code.  Irish would point to Goidelic, 
which would point, possibly via intermediate nodes, to Celtic.  The 
effects of changes, such as whether there was a Continental/Insular 
Celtic split, would thus be localised.

  [Each code would also be labelled with an ‘order’ parameter, specifying
   its order among its sibs in a sensible linearization of the tree,
   which would keep the most similar languages closest together.  This
   useful feature is missing in all existing sytems, meaning that people
   have to fall back to ugly semi-alphabetic linearizations.]

(2) Accept that there would be an ongoing need to maintain basic 
information on hundreds, or possibly thousands of deprecated codes: just 
their main inclusion relations to the nearest current non-deprecated 
codes.

Caoimhín