Question on ISO-639:1988
jjc at hplb.hpl.hp.com
Wed Jun 2 18:52:14 CEST 2004
Debbie Garside wrote:
> LS 639 (proposed as ISO 639-6 - Aug 2004) deals very well with language
> varieties - written and spoken (signed, audio and visual to be
> included). Anyone interested in the development of this new standard
> may like to read the paper/workshop presented at LREC in Lisbon in order
> to see exactly what is being proposed. Visit www.linguasphere.com
> <http://www.linguasphere.com> - all comments and critisism most welcome
> at this stage of development. Please feel free to sign up for the forum
> - although it has only just been created.
I had a quick read of these papers and was somewhat underwhelmed.
From a technical, rather than a linguistic viewpoint, I worried about
use cases. None seem to be identified. A simple one would be say
finding some text in a DB that a particular reader might understand.
This use case seems highly dependent on the metadata describing the
relationships between the linguistic tags, which are described as being
historically variable. It is not clear that the intended standard
actually will include such metadata of sufficient quality to make any
tagged text usable in any meaningful sense.
While the 3066bis approach of hard-coding some relationships into the
way the language identifier is created from subtags has some obvious
weaknesses, it will at least *work*, reasonably well, most of the time.
Moreover a small footprint implementation is possible without any
linguistic knowledge whatsoever.
The approach in this work will require applications that either only
permit a simple match on the four-character tag, (which given the number
of tags seems useless) or will require a library that instantiates the
metadata describing the relationships between the tags. I guess it will
keep some people in jobs.
While I feel less qualified in this group to discuss the linguistic
issues, I did feel that the old chestnut of "what is a language?" was
receiving an arbitrary answer (approximately a language grouping is
about 1/25,000 of the linguistic variability world-wide). As far as I am
aware any answer to this question is arbitrary, we could go down as far
as idiolects, or even idiolects for particular purposes (this e-mail is
written with a different choice of vocabulary, and a different set of
emotional choices, different grammatical structures, then e-mail I send
to say an HP Semantic Web e-mail list). However, there seemed little
acknowledgement concerning the arbitrariness of the choice, and a lot of
trumpet blowing about the benefits that would acrue from such choices
(despite the lack of use cases in which to ground these benefits).
I guess I should look into how to influence the BSI decision.
Apologies that this is an impressionistic account, rather than quoting
chapter and verse from your work to capture my worries. If there is
stuff that seems wholly off-beam I guess I can try and justify my
statements from your papers.
(based in Bristol UK)
More information about the Ietf-languages