Question on ISO-639:1988

Peter Constable petercon at microsoft.com
Tue Jun 1 20:49:21 CEST 2004


> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Debbie Garside

> >>I looked at the description of Irish that is or was on the
> >>Linguasphere website once, and it divided things up so much that I
> >>find it hard to see a use for it.
> 
> There are many uses including cataloguing etc. and these uses will
become
> more apparent as we start to look more at mobile technologies and the
> cataloguing for spoken/audio/signed/written media.

I've yet to see from the people involved with this project a realistic
evaluation of IT needs. The LREC paper asserts (in the abstract):

"The international community, including the International Organization
for Standardization (ISO), is currently seeking more granular systems of
language identifiers than the widely used tags of ISO 639 parts 1 and 2.
There is growing need for the more precise identification and annotation
of language-based resources."

A *system* capable of supporting fine granularity is needed, and RFC
3066 already provides that. This is not referring to just the system,
though, but also a coding: the point of the Linguasphere proposal is
both a *system* and a *coding* for broad and highly-granular coverage of
language varieties.

There is no question that there is wide need for *broader* coverage than
is provided in ISO 639-1/-2. That can be viewed as more granular
coverage than is provided by the collective categories in ISO 639-2, but
without involving finer levels of granularity than are already provided
in ISO 639-1/-2. There is *occasional* need for finer levels of
granularity, and the request for sl-nedis is an instance of this. What I
have not at all seen, however, is that there is a widespread need for
coding coverage that is both broad and highly granular.

The paper states, 

"It is important therefore to step back from the ongoing formulation of
the standard
and its necessarily intricate and formalised procedures, in order to
reflect on what is to be provided, by what means, and most importantly,
for what purposes."

but then without *any* evaluation of purposes proceeds to propose
particular design principles that emphasize the particular philosophic
bent of the Linguasphere. The discussion of present and future needs
identifies a need on which I think there is reasonable consensus:

"The observation and understanding of linguistic phenomena requires the
transparent, accurate and unambiguous identification of every spoken,
written and sign language..."

But it then proceeds to extend the scope into what is the particular
bent of the Linguasphere:

"...including each component variety, community and recorded corpus,
from the most globalised to the most localised."


Effectively, it all but dispenses with any emic / etic distinction,
suggesting that not only do IT systems need to support any and all
distinction in linguistic varieties that is documented, but these should
all be coded. They assert that "the number of entities to be
identified... is already in excess of 25,000" before having even
attempted to establish what are the requirements for coding. The need
simply has not been demonstrated.

Frankly, I know of nobody in industry who has been looking for 25,000
linguistic varieties or identities to support, let alone 70,000 or
450,000. I have indicated to Debbie more than once that an alpha-4
scheme creates compatibility issues with existing implementations that
industry is heavily invested in, yet they appear to be committed to that
approach. They are suggesting a new set of IDs covering a lot of
distinctions for which no clear need has been shown, and using a coding
scheme that would require entirely new protocols and implementations.
Will it surprise anyone if reaction from industry lacks enthusiasm?

I'm also very concerned at the representations given of various projects
in this paper and elsewhere: Debbie has repeatedly referred to the
Linguasphere / BSI proposal as "the new standard", speaking in an ISO
context, even though there is no approved ISO work item related to it.
This paper suggests that it has the same status as work being done on
other projects that are referred to as "proposed" when in fact those
other projects are approved ISO TC 37/SC 2 work items. The BSI proposal
may we become an approved work item and eventually a published ISO
standard, but until it is an approved work item, I think it is
inappropriate to speak as though the road to ISO publication is just a
matter of time and process.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division


More information about the Ietf-languages mailing list