Question on ISO-639:1988
jcowan at reutershealth.com
jcowan at reutershealth.com
Thu Jun 3 18:43:37 CEST 2004
Lee Gillam scripsit:
> I don't know whether I've seen the (proposed? modified? adopted?)
> latest RFC 3066 (2001 seems to be the common one I can find),
RFC numbers, unlike ISO ones, are replaced absolutely when new versions are
issued. The precedessor of RFC 3066 was RFC 1766, and its successor will
get a number in the 3300s or higher, depending on when it is actually issued.
> Of course, for legacy systems, if ISO 639-1 were to be frozen in its
> current state,
It pretty well is. In order to add a new language now, it would have to be
one not already encoded in ISO 639-2 as well as being eligible by the
ISO 639-1 criteria.
> this would leave a number of unused alpha2s that could be adopted by
> individuals or organisations to refer to identifiers elsewhere
A very dangerous practice. Few modern databases are so restrictive that
they can use only 639-1 for technical reasons. The community has
painfully learned that unassigned codes of any sort (outside dedicated
private-use areas) should be left severely alone.
> I admit to finding the requirement for mnemonic labels slightly odd -
> international transport functions well with Toronto airport having
> a YYZ tag. This difference between having an identifier and having a
> specifically constructed identifier shows when one considers that the
> 450 or so alpha3s could be catered for as alpha2s if this requirement
> were removed.
It does provide a certain degree of robustness against coding errors.
I once (with shame I confess it) tagged about 30,000 Japanese resources
as "jp" rather than the correct "ja" ("jp" of course being the ISO 3166-1
code for Japan), but at least "jp" doesn't mean, say, "Buginese".
(Who knows how often our luggage destined for Oakland, California (OAK)
is sent to Oamaru, N.Z. (OAM) instead, to be retrieved only after
> And as a slight aside, I suspect France or Germany will win Euro 2004,
> with Italy and Portugal in the semi finals also. It would be nice if
> England won, but current form suggests they'll be home before the postcards.
Until today, I (ignorant Yank) didn't even know what it was.
The worst problem I see with the Linguasphere identifiers is the extreme
difficulty of relating the more general to the less general, as must be
done if requests are to be appropriately satisfied. It may make sense
to assign distinct 4-letter codes to such linguistic entities as:
Hiberno-English, spoken in Dublin
Hiberno-English, spoken in Dublin on the North Circular Road
Hiberno-English, spoken in Dublin on the North Circular Road (south side)
but a supplier of information that has content tagged with the last
code will not be able to reply to a request for simply "English" unless
it grasps this particular branch of the entire system (which leads up
to "Germanic" and "Indo-European" at higher levels, if I understand
In order to do this, it must have the Linguasphere key (hierarchical
identifier) corresponding to the 4-letter code, but this is (a) unstable
and (b) brittle, with its fixed maximum hierarchical depth of 8 and its
limited fanout of 10 to 26 siblings at each level.
In addition, any such hierarchical system that implements only one
hierarchy (a mixture of geographical and phylogenetic information,
and as far as the 2-digit value that forms the first two tree levels,
very ingeniously designed) will often produce the wrong answer. Thus,
if Irish information is requested and none is forthcoming, it is almost
certainly going to be better to return English (agreeing in the first
digit of the hierarchical code only) than Welsh (agreeing in the first
In short (and while I am not judging the system in full, not having seen
it in full), I very much suspect that for IT purposes the game will not
be worth the candle. I wish it were.
"Take two turkeys, one goose, four John Cowan
cabbages, but no duck, and mix them http://www.ccil.org/~cowan
together. After one taste, you'll duck jcowan at reutershealth.com
soup the rest of your life." http://www.reutershealth.com
More information about the Ietf-languages