ISO 639-5 reconfirmation ballot (long)
Caoimhin O Donnaile
caoimhin at smo.uhi.ac.uk
Tue Jul 19 17:32:45 CEST 2016
> Actually, ISO 639-6 used existing three-letter codes for languages when
> possible, and assigned four-letter codes for groupings (moving up the tree)
> and dialects and other variants (moving down the tree).
>
> So the only difference is that Caoimhín's system would use four-letter codes
> uniformly. It seems that 639-6 actually would have been a better fit for
> Anthony's use case of being able to identify an individual language code at
> sight.
Yes, and I can see Anthony’s point about not wanting to pollute a
relatively clean system of three-letter codes for languages with a small
number of almost unused codes for language groupings. (Although the
clean system is already slightly ‘polluted’ by codes for
“macrolanguages”.)
In the “ideal” system I was suggesting, the fact that codes for dialects
and languages and nodes in the genetic hierarchy would all be
four-letter and indistinguishable at sight is actually intended as a
desirable feature, very important for stability - although I recognise
that this has disadvantages as well as advantages.
It would mean that we would not have had all the anguish about whether
to assign a code for Elfdalian. It would mean that we would avoid
confusing situations such as whether:
et = est = ekk+vro (as SIL/Ethnologue/ISO seems to think)
et = ekk (as Google Translate and lots of others seem to think)
and lots of similar situations:
az/aze/azj/azb
my/msa/zsm/...
ar/ara/arb/...
sq/sqi/aln/als
sw/swa/swh
sh...
> This sounds exactly like ISO 639-6. Why was that standard withdrawn,
> then?
I was only ever vaguely aware of ISO 639-6 but I never saw anything
usable come out of it. Perhaps it was too dependent on one or two
individuals who moved on? Perhaps it was too ambitious for the
available resources and infrastructure? The Wikipedia description
certainly sounds very like what I was suggesting myself, but I never saw
anything like that emerge, otherwise I might well be using it. I
vaguely remember seeing very detailed longer codes in a codespace too
dense to be extendible, and only available in pdf form. In 2012 the
database supporting the standard was still being promised in the coming
months:
http://web.archive.org/web/20120314165525/http://www.geolang.com/iso_639-6.php
but it looks as if geolang.com moved on to be a cyber security company.
I think for a genetic grouping system for languages to work, there is a
need to:
(1) Accept that it has to be a real-time, online, algorithmic system in
constant flux in response to ongoing specialist linguistic advice. The
system would not attempt to enumerate a list of all Celtic languages for
example. The structure would actually be very very simple with each
code just pointing to its parent code. Irish would point to Goidelic,
which would point, possibly via intermediate nodes, to Celtic. The
effects of changes, such as whether there was a Continental/Insular
Celtic split, would thus be localised.
[Each code would also be labelled with an ‘order’ parameter, specifying
its order among its sibs in a sensible linearization of the tree,
which would keep the most similar languages closest together. This
useful feature is missing in all existing sytems, meaning that people
have to fall back to ugly semi-alphabetic linearizations.]
(2) Accept that there would be an ongoing need to maintain basic
information on hundreds, or possibly thousands of deprecated codes: just
their main inclusion relations to the nearest current non-deprecated
codes.
Caoimhín
More information about the Ietf-languages
mailing list