Question on ISO-639:1988
l.gillam at eim.surrey.ac.uk
Tue Jun 8 10:49:58 CEST 2004
> A summary statement of need for ISO 639-3 was included in the NWIP.
The NWIP statement was, IMO, a bit vague in respect of users, use cases
and the like:
There has been a growing and urgent demand among users for language
identifiers representating a wide variety of languages spoken in the world,
including many lesser-known languages that are out of scope for either ISO
639-1 or ISO 639-2. User communities for which there is a known need include
the academic linguistics community, the language resource community, software
developers, governmental and non-governmental agencies, and industry
standardisation bodies. The need for a new part to ISO 639 covering all
individual languages is urgent since there is a growing risk that different
user groups will begin (and some have already begun) to develop different
A summary statement was contained in a document presented 5 August 2003,
and although the "world" has changed since then, it would still seem
to be relevant:
For a number of linguistic, language resource, and language technology
applications it is inadequate to specify languages only down to the level that
is provided in ISO 639-1, ISO 639-2, ISO 639-3 (proposed), and ISO 639-5
(proposed). There is a need for methods to specify and interchange information
on a language variant level.
This need would seem to have been determined by SC2, though I'm not
aware of the specific discussions about them.
> - ISO 639-3 is to be simply an extension of ISO 639-2.
> - It uses exactly the same mechanisms (alpha-3).
I have been wondering how we will know whether an alpha-3 comes from 639-2
or 639-3 if the mechanisms are the same?
Would this question not also be relevant to what is being supported in
RFC 3066? Perhaps there is an answer somewhere already?
> > Peter, on a lighter note, anybody noting that you work for Microsoft
> > well ask the question about functionality creep in MS products -
> > about 90% of users only using 10% of the functions. I couldn't
.... and so whatever is provided, 90% of users will probably not need it.
Probably 90% will be unlikely to use all the SIL identifiers.
Would that mean providing only the 10% that they might need, and
ignoring the niche group?
A need, as set above, was to prevent, where possible, fragmentation
through re-invention. Those wishing to use any subset of identifiers should
be able to - and this is part of the philosophy running in TC37 SC4 w.r.t.
Language Resources in general.
> For one thing, existence of implementations designed to accommodate
> alpha-3 and a positive lack of desire to re-engineer to support alpha-4
> when there wasn't a clear need for alpha-4 for the application contexts
> in question.
There is no need to re-engineer - alpha-2 and alpha-3 are not being
subsumed, but mapped. If you have an alpha-2 code, use it. If you have
alpha-3, use it. Similar basis to 3066 from what I see of recent
discussions. As above, those who don't need it will have no additional
work to do.
> Well, the problem is that, unless the symbolic IDs are all long or of
> variable length, with a large set of names, you inevitably run out of
> distinct ways to remain mnemonic. IIRC, for alpha-3 IDs for language
> names (at the ~7000 level of granularity), there aren't enough IDs
> beginning with "m" to cover all the names that begin with "m".
Now, had you used alpha-4 ...... ;-)
More information about the Ietf-languages