Ietf-languages Digest, Vol 74, Issue 1

John Cowan cowan at ccil.org
Sun Feb 22 02:03:56 CET 2009


[Quoted fragments have been reordered]

Anthony Aristar scripsit:

> [T]he code-set is a mish-mash that is very reminiscent of the mess
> that ISO 639-1/2 were before ISO 639-3 came along [...].

Not surprising: it's the same mish-mash, just with additional codes for
some well-known groupings.

The purpose of 639-5, at least in connection with BCP 47, is to make it
possible to tag documents whose language has not been determined exactly.
It allows vagueness.  You may not know the exact language of a document,
but perhaps you at least know that it is written in a North American
Indian language, so you can tag it "nai" or perhaps "nai-Latn" or
"nai-fonipa" to add information about the transcription.  That gives
someone classifying or retrieving the document something more to go on
that a flat "und" or other indicator of absence.

For classification, it doesn't much matter if a group is genetic or not.
Indeed, genetic groupings may be singularly unhelpful, not to mention
unstable, in parts of the world where the relationships between languages
are not yet firmly established.  And in the BCP 47 world, we value
stability at least slightly higher than truth.

> [T]he use of Alpha-3 makes the codes easily confusable with ISO 
> 639-3 .  I know of at least one project that simply wont use them 
> because of this.

Whereas that is very convenient for BCP 47 purposes: the "primary
language" subtag can be a collection, a macrolanguage, or an individual
language without having to have variable syntax (except for the use of
639-1 two-letter codes, which is retained for backward compatibility).

> [ISO 639-5] is, like the original 639-1, so small as to be relatively 
> useless.  The fact that it can be expanded through the normal change 
> process is not very useful:  it will take a *LONG* time to get 
> everything in that we as linguists need.

It's simply not meant for use by linguists.

> [S]ome of the names used are enough to make linguists cringe.

True enough.

    A cocky novice once said to Stallman: "I can guess why the editor
    is called Emacs, but why is the justifier called Bolio?". Stallman
    replied forcefully, "Names are but names.  'Emack & Bolio's' is the
    name of a popular ice-cream shop in Boston-town. Neither of these
    men had anything to do with the software."

    His question answered, yet unanswered, the novice turned to go,
    but Stallman called to him, "Neither Emack nor Bolio had anything
    to do with the ice-cream shop, either."

This is generally known as the ice-cream koan.

-- 
He played King Lear as though           John Cowan <cowan at ccil.org>
someone had played the ace.             http://www.ccil.org/~cowan
        --Eugene Field


More information about the Ietf-languages mailing list