Last call: Latvian (and Bontok) extlang subtags

Fri Feb 12 01:24:17 CET 2010

I think the bother stems from the fact that computer applications
are still nearly all very unintelligent as regards the relationships
between languages.  They treat languages as if they were all just 
unrelated "atoms".

That was fine in the ISO 639-1 days, but ISO 639-3 gives tags to
"languages" which while being very different and worth tagging
separately are still close enough to be able to make some use
of each other's resources.

So someone looking for lv would probably be happy with webpages
labelled lvs and quite likely also ltg.  In fact, someone looking
for lvs might be happy with ltg.  Someone looking for en might be
happy to have sco included, especially if results were scarce.
Someone looking for zh would likely be very happy with cmn.
Someone looking for sr would most likely be able to read hr.
Someone with poor Norwegian struggling to read a page labelled
nn might well be able to benefit from a dictionary labelled nb,
in the absence of anything better.

It looks as if the extended language subtags are an ad-hoc
expedient to build some intelligence into the tags themselves
in cases which are currently likely to cause bother.

My vision of the future is that the tags would be atomic, but
the intelligence would come from an online database which
applications could download or consult.  This database would have
information regarding which languages were closely related in
their writing or in their speech; and it would include information
on tags which were now deprecated, saying how they were related to 
now recommended tags for languages and macrolanguages.

Caoimhín