Question on ISO-639:1988

Jeremy Carroll jjc at
Wed Jun 2 18:52:14 CEST 2004

Debbie Garside wrote:

> Addison
> LS 639 (proposed as ISO 639-6 - Aug 2004) deals very well with language 
> varieties - written and spoken (signed, audio and visual to be 
> included).  Anyone interested in the development of this new standard 
> may like to read the paper/workshop presented at LREC in Lisbon in order 
> to see exactly what is being proposed.  Visit 
> <> - all comments and critisism most welcome 
> at this stage of development.  Please feel free to sign up for the forum 
> - although it has only just been created.
> Debbie

I had a quick read of these papers and was somewhat underwhelmed.

 From a technical, rather than a linguistic viewpoint, I worried about 
use cases. None seem to be identified. A simple one would be  say 
finding some text in a DB that a particular reader might understand.

This use case seems highly dependent on the metadata describing the 
relationships between the linguistic tags, which are described as being 
historically variable. It is not clear that the intended standard 
actually will include such metadata of sufficient quality to make any 
tagged text usable in any meaningful sense.

While the 3066bis approach of hard-coding some relationships into the 
way the language identifier is created from subtags has some obvious 
weaknesses, it will at least *work*, reasonably well, most of the time. 
Moreover a small footprint implementation is possible without any 
linguistic knowledge whatsoever.

The approach in this work will require applications that either only 
permit a simple match on the four-character tag, (which given the number 
of tags seems useless) or will require a library that instantiates the 
metadata describing the relationships between the tags. I guess it will 
keep some people in jobs.

While I feel less qualified in this group to discuss the linguistic 
issues, I did feel that the old chestnut of "what is a language?" was 
receiving an arbitrary answer (approximately a language grouping is 
about 1/25,000 of the linguistic variability world-wide). As far as I am 
aware any answer to this question is arbitrary, we could go down as far 
as idiolects, or even idiolects for particular purposes (this e-mail is 
written with a different choice of vocabulary, and a different set of 
emotional choices, different grammatical structures, then e-mail I send 
to say an HP Semantic Web e-mail list). However, there seemed little 
acknowledgement concerning the arbitrariness of the choice, and a lot of 
trumpet blowing about the benefits that would acrue from such choices 
(despite the lack of use cases in which to ground these benefits).

I guess I should look into how to influence the BSI decision.

Apologies that this is an impressionistic account, rather than quoting 
chapter and verse from your work to capture my worries. If there is 
stuff that seems wholly off-beam I guess I can try and justify my 
statements from your papers.


(based in Bristol UK)

More information about the Ietf-languages mailing list