Question on ISO-639:1988
petercon at microsoft.com
Fri Jun 4 23:30:56 CEST 2004
> From: Lee Gillam [mailto:l.gillam at eim.surrey.ac.uk]
> Out of interest, is there a document of the needs analysis for 639-3
> readily available? Perhaps this would be a good starting point for
> producing something of a similar/related nature. Google gave me
> "needs analysis" ethnologue ISO
> "needs analysis" ethnologue IETF
A summary statement of need for ISO 639-3 was included in the NWIP. A
detailed needs analysis was never requested, and I think it's fair to
say that's because it didn't particularly require significant analysis:
- ISO 639-3 is to be simply an extension of ISO 639-2.
- It uses exactly the same mechanisms (alpha-3).
- It does not introduce any significantly-different levels of
granularity. (In the course of working on it, the working group found it
helpful to add a new notion of granularity, the "macrolanguage", but
that level of granularity was already present; identifying it, giving it
a name and establishing the relationships simply makes clearer what was
- The main purpose was to provide more complete coverage at the existing
levels of granularity, and there were many parties indicating a need for
that -- these are known industry needs coming from several sectors.
The proposed ISO 639-6 differs in each of those respects.
> A slightly contrived example might be: I have a recorded collection of
> speech of Middle Chulym, which for the sake of argument has a tag
> On the one hand I wish to provide a description of it in English. On
> other hand, I'd like to provide a description in French, but perhaps
> I'm not very good at French (true) and so I'll leave a placeholder
> might be treated as a comment. Now, I don't like providing XML-type
> examples for human readability, but suppose for interchange I've
> some format that might contain a fragment:
> <resource speech_lang_tag="myluhc">
> <store lang="en">
> <description xml:lang="en">Lots of words
> <store lang="fr">
> <description xml:lang="en">Sorry, don't know French
> As I said, contrived, but perhaps this can help us propose an answer
> the question.
So, what we'd want to look at are the usage scenarios related to a
record of this nature asking questions such as:
- What users and in what kinds of applications will want to retrieve a
record like this based on the semantics of "myluhc"? Are those semantics
that will be used by a divers group of users in divers application
scenarios, or just a small group of users (e.g. one particular research
project) in one type of application? (Of course, the answer is likely
somewhere in between.)
- How many users wanting to use semantics of that level of granularity
want to use exactly the same semantics? E.g. if "Middle Chulym" (taking
a wild guess) refers to Russian spoken in the environs of the stretch of
the Chulym River roughly between Zyryanskoye and Achinsk, do many others
also want to distinguish that variety from what is spoken in neighboring
environs, or are there some that want to make distinctions that
cross-cut those, say along the Tomsk-Krasnoyarsk border? (I'm speaking
in hypothetical terms of course -- I know nothing of the varieties in
- Do the users wanting to utilize an identifier with semantics of
"myluhc" also need a comprehensive set of identifiers that cover the
entire globe at a similar level of granularity, or do the just need
identifiers for that variety and a few others closely related to it?
Of course, we can't get complete, factual answers to all such questions,
but we have to make a reasonable attempt to come up with what we believe
to be sound judgements about them.
(I think the interaction of the second and third issues presented by
those questions is particular ground for doubt: that lots of people *at
this level of granularity* want to partition the entire globe and in
just the same manner?)
> Peter, on a lighter note, anybody noting that you work for Microsoft
> well ask the question about functionality creep in MS products -
> about 90% of users only using 10% of the functions. I couldn't
> > At one time, alpha-4 was
> > suggested for ISO 639-3, but that was abandoned when it became clear
> > that doing so would create obstacles to implementation with no
> > significant benefit.
> What were the obstacles in this case, and how do they relate to the
> above private use extensions?
For one thing, existence of implementations designed to accommodate
alpha-3 and a positive lack of desire to re-engineer to support alpha-4
when there wasn't a clear need for alpha-4 for the application contexts
> As I understand things, there have been prior discussions about how
> certain codes MUST be mnemonic. I sit firmly on the *non* side, but
> then probably incur the wrath of those who want them.
Well, the problem is that, unless the symbolic IDs are all long or of
variable length, with a large set of names, you inevitably run out of
distinct ways to remain mnemonic. IIRC, for alpha-3 IDs for language
names (at the ~7000 level of granularity), there aren't enough IDs
beginning with "m" to cover all the names that begin with "m".
Globalization Infrastructure and Font Technologies
Microsoft Windows Division
More information about the Ietf-languages