ISO 639-5 reconfirmation ballot (long)

Tue Jul 19 19:16:39 CEST 2016

Caoimhín Ó Donnaíle wrote:

> Yes, and I can see Anthony’s point about not wanting to pollute a
> relatively clean system of three-letter codes for languages with a
> small number of almost unused codes for language groupings. (Although
> the clean system is already slightly ‘polluted’ by codes for
> “macrolanguages”.)

Chinese and Arabic would be interesting cases indeed, were it not for
macrolanguages. Instead of everyone being slightly dissatisfied with the
outcome, one side, either the lumpers or the splitters, would be
outraged and would declare the standard useless.

> In the “ideal” system I was suggesting, the fact that codes for
> dialects and languages and nodes in the genetic hierarchy would all be
> four-letter and indistinguishable at sight is actually intended as a
> desirable feature, very important for stability - although I recognise
> that this has disadvantages as well as advantages.

This is an important point. Different users have different needs and
wishes, and no system is going to satisfy all of them. So when decisions
are made, it's often helpful to understand whose need was thus met and
what the tradeoffs were. I'm reminded of an article some years ago where
the author declared ISO 3166-1 to be "so bad [as to be] beyond belief"
because it assigned a code element to Bouvet Island.

> It would mean that we would not have had all the anguish about whether
> to assign a code for Elfdalian. It would mean that we would avoid
> confusing situations such as whether:
> et = est = ekk+vro (as SIL/Ethnologue/ISO seems to think)
> et = ekk (as Google Translate and lots of others seem to think)
> and lots of similar situations:
> az/aze/azj/azb
> my/msa/zsm/...
> ar/ara/arb/...
> sq/sqi/aln/als
> sw/swa/swh
> sh...

I'm guessing that would be the exact opposite of meeting Anthony's need:
to distinguish codes for individual languages plainly from codes for
anything larger or smaller.

> I was only ever vaguely aware of ISO 639-6 but I never saw anything
> usable come out of it. Perhaps it was too dependent on one or two
> individuals who moved on? Perhaps it was too ambitious for the
> available resources and infrastructure? The Wikipedia description
> certainly sounds very like what I was suggesting myself, but I never
> saw anything like that emerge, otherwise I might well be using it. I
> vaguely remember seeing very detailed longer codes in a codespace too
> dense to be extendible, and only available in pdf form. In 2012 the
> database supporting the standard was still being promised in the
> coming months:
> http://web.archive.org/web/20120314165525/http://www.geolang.com/iso_639-6.php
> but it looks as if geolang.com moved on to be a cyber security
> company.

Forwarding to Debbie, who might be able to shed light on this.

> I think for a genetic grouping system for languages to work, there is
> a need to:
>
> (1) Accept that it has to be a real-time, online, algorithmic system
> in constant flux in response to ongoing specialist linguistic advice.
> The system would not attempt to enumerate a list of all Celtic
> languages for example. The structure would actually be very very
> simple with each code just pointing to its parent code. Irish would
> point to Goidelic, which would point, possibly via intermediate nodes,
> to Celtic. The effects of changes, such as whether there was a
> Continental/Insular Celtic split, would thus be localised.
>
> [Each code would also be labelled with an ‘order’ parameter,
> specifying its order among its sibs in a sensible linearization of the
> tree, which would keep the most similar languages closest together.
> This useful feature is missing in all existing sytems, meaning that
> people have to fall back to ugly semi-alphabetic linearizations.]
>
> (2) Accept that there would be an ongoing need to maintain basic
> information on hundreds, or possibly thousands of deprecated codes:
> just their main inclusion relations to the nearest current non-
> deprecated codes.

This sounds like a great project, possibly very useful to some people,
and completely outside of anything BCP 47 tries to do.

--
Doug Ewell | Thornton, CO, US | ewellic.org