Ietf-languages Digest, Vol 74, Issue 1
aristar at linguistlist.org
Sun Feb 22 16:11:50 CET 2009
Well, it's interesting to know the background for this set. But it
raises a recurrent issue for ISO standards. More
than once in the past I've seen a standard promulgated; yet the
explanation for the oddities of that
standard are known only to those who happen to be in a select group.
This means that standards generate
a kind of contempt from those who are outsiders.
The official term for 639-5 is, after all: "Codes for the Representation
of Names of Languages. Part 5: Alpha-3
code for language families and groups" If what John says is true -- and
I am quite willing to believe it is -- then
this title is not merely misleading, but erroneous. This is not, it
seems, what these codes are actually doing.
Furthermore, the idea of "vague" codes is a very useful one. There are
indeed situations where you would
like to tag a data-set as just "North American.Indian". But the
solution is not to produce a code-set that
is so internally confused about whether it refers to geographical
regions or linguistic ones that it is more
likely to generate derision than acceptance.
John Cowan wrote:
> [Quoted fragments have been reordered]
> Anthony Aristar scripsit:
>> [T]he code-set is a mish-mash that is very reminiscent of the mess
>> that ISO 639-1/2 were before ISO 639-3 came along [...].
> Not surprising: it's the same mish-mash, just with additional codes for
> some well-known groupings.
> The purpose of 639-5, at least in connection with BCP 47, is to make it
> possible to tag documents whose language has not been determined exactly.
> It allows vagueness. You may not know the exact language of a document,
> but perhaps you at least know that it is written in a North American
> Indian language, so you can tag it "nai" or perhaps "nai-Latn" or
> "nai-fonipa" to add information about the transcription. That gives
> someone classifying or retrieving the document something more to go on
> that a flat "und" or other indicator of absence.
> For classification, it doesn't much matter if a group is genetic or not.
> Indeed, genetic groupings may be singularly unhelpful, not to mention
> unstable, in parts of the world where the relationships between languages
> are not yet firmly established. And in the BCP 47 world, we value
> stability at least slightly higher than truth.
>> [T]he use of Alpha-3 makes the codes easily confusable with ISO
>> 639-3 . I know of at least one project that simply wont use them
>> because of this.
> Whereas that is very convenient for BCP 47 purposes: the "primary
> language" subtag can be a collection, a macrolanguage, or an individual
> language without having to have variable syntax (except for the use of
> 639-1 two-letter codes, which is retained for backward compatibility).
>> [ISO 639-5] is, like the original 639-1, so small as to be relatively
>> useless. The fact that it can be expanded through the normal change
>> process is not very useful: it will take a *LONG* time to get
>> everything in that we as linguists need.
> It's simply not meant for use by linguists.
>> [S]ome of the names used are enough to make linguists cringe.
> True enough.
> A cocky novice once said to Stallman: "I can guess why the editor
> is called Emacs, but why is the justifier called Bolio?". Stallman
> replied forcefully, "Names are but names. 'Emack & Bolio's' is the
> name of a popular ice-cream shop in Boston-town. Neither of these
> men had anything to do with the software."
> His question answered, yet unanswered, the novice turned to go,
> but Stallman called to him, "Neither Emack nor Bolio had anything
> to do with the ice-cream shop, either."
> This is generally known as the ice-cream koan.
Anthony Aristar, Director, Institute for Language & Information Technology
Professor of Linguistics Moderator, LINGUIST Linguistics Program
Dept. of English aristar at linguistlist.org
Eastern Michigan University 2000 Huron River Dr, Suite 104
Ypsilanti, MI 48197
More information about the Ietf-languages