Question on ISO-639:1988

Tue Jun 8 10:49:58 CEST 2004

> A summary statement of need for ISO 639-3 was included in the NWIP. 

The NWIP statement was, IMO, a bit vague in respect of users, use cases
and the like:

	There has been a growing and urgent demand among users for language 
identifiers representating a wide variety of languages spoken in the world, 
including many lesser-known languages that are out of scope for either ISO 
639-1 or ISO 639-2. User communities for which there is a known need include 
the academic linguistics community, the language resource community, software 
developers, governmental and non-governmental agencies, and industry 
standardisation bodies. The need for a new part to ISO 639 covering all 
individual languages is urgent since there is a growing risk that different 
user groups will begin (and some have already begun) to develop different 
incompatible codes.

A summary statement was contained in a document presented 5 August 2003,
and although the "world" has changed since then, it would still seem
to be relevant:

	For a number of linguistic, language resource, and language technology 
applications it is inadequate to specify languages only down to the level that 
is provided in ISO 639-1, ISO 639-2, ISO 639-3 (proposed), and ISO 639-5 
(proposed). There is a need for methods to specify and interchange information 
on a language variant level.

This need would seem to have been determined by SC2, though I'm not
aware of the specific discussions about them.

> - ISO 639-3 is to be simply an extension of ISO 639-2. 
> 
> - It uses exactly the same mechanisms (alpha-3).

I have been wondering how we will know whether an alpha-3 comes from 639-2 
or 639-3 if the mechanisms are the same? 

Would this question not also be relevant to what is being supported in 
RFC 3066? Perhaps there is an answer somewhere already?

> > Peter, on a lighter note, anybody noting that you work for Microsoft
> may
> > well ask the question about functionality creep in MS products -
> maxims
> > about 90% of users only using 10% of the functions. I couldn't
> possibly...
> 
> :-) 

.... and so whatever is provided, 90% of users will probably not need it.
Probably 90% will be unlikely to use all the SIL identifiers.
Would that mean providing only the 10% that they might need, and 
ignoring the niche group?

A need, as set above, was to prevent, where possible, fragmentation
through re-invention. Those wishing to use any subset of identifiers should
be able to - and this is part of the philosophy running in TC37 SC4 w.r.t.
Language Resources in general.

> For one thing, existence of implementations designed to accommodate
> alpha-3 and a positive lack of desire to re-engineer to support alpha-4
> when there wasn't a clear need for alpha-4 for the application contexts
> in question.

There is no need to re-engineer - alpha-2 and alpha-3 are not being
subsumed, but mapped. If you have an alpha-2 code, use it. If you have
alpha-3, use it. Similar basis to 3066 from what I see of recent
discussions. As above, those who don't need it will have no additional
work to do.

> Well, the problem is that, unless the symbolic IDs are all long or of
> variable length, with a large set of names, you inevitably run out of
> distinct ways to remain mnemonic. IIRC, for alpha-3 IDs for language
> names (at the ~7000 level of granularity), there aren't enough IDs
> beginning with "m" to cover all the names that begin with "m".

Now, had you used alpha-4 ......  ;-)