Ietf-languages Digest, Vol 74, Issue 1

Sun Feb 22 16:11:50 CET 2009

Well, it's interesting to know the background for this set.  But it 
raises a recurrent issue  for ISO standards.  More
than once in the past I've seen a standard promulgated;  yet the 
explanation for the oddities of that
standard are known only to those who happen to be in a select group.  
This means that standards generate
a kind of contempt from those who are outsiders.

The official term for 639-5 is, after all: "Codes for the Representation 
of Names of Languages. Part 5: Alpha-3
code for language families and groups"  If what John says is true -- and 
I am quite willing to believe it is -- then
this title is not merely misleading, but erroneous.  This is not, it 
seems, what these codes are actually doing.

Furthermore, the idea of "vague" codes is a very useful one.  There are 
indeed situations where you would
like to tag a data-set as just "North American.Indian".  But the 
solution is not to produce a code-set that
is so internally confused about whether it refers to geographical 
regions or linguistic ones that it is more
likely to generate derision than acceptance.

John Cowan wrote:
> [Quoted fragments have been reordered]
>
> Anthony Aristar scripsit:
>
>   
>> [T]he code-set is a mish-mash that is very reminiscent of the mess
>> that ISO 639-1/2 were before ISO 639-3 came along [...].
>>     
>
> Not surprising: it's the same mish-mash, just with additional codes for
> some well-known groupings.
>
> The purpose of 639-5, at least in connection with BCP 47, is to make it
> possible to tag documents whose language has not been determined exactly.
> It allows vagueness.  You may not know the exact language of a document,
> but perhaps you at least know that it is written in a North American
> Indian language, so you can tag it "nai" or perhaps "nai-Latn" or
> "nai-fonipa" to add information about the transcription.  That gives
> someone classifying or retrieving the document something more to go on
> that a flat "und" or other indicator of absence.
>
> For classification, it doesn't much matter if a group is genetic or not.
> Indeed, genetic groupings may be singularly unhelpful, not to mention
> unstable, in parts of the world where the relationships between languages
> are not yet firmly established.  And in the BCP 47 world, we value
> stability at least slightly higher than truth.
>
>   
>> [T]he use of Alpha-3 makes the codes easily confusable with ISO 
>> 639-3 .  I know of at least one project that simply wont use them 
>> because of this.
>>     
>
> Whereas that is very convenient for BCP 47 purposes: the "primary
> language" subtag can be a collection, a macrolanguage, or an individual
> language without having to have variable syntax (except for the use of
> 639-1 two-letter codes, which is retained for backward compatibility).
>
>   
>> [ISO 639-5] is, like the original 639-1, so small as to be relatively 
>> useless.  The fact that it can be expanded through the normal change 
>> process is not very useful:  it will take a *LONG* time to get 
>> everything in that we as linguists need.
>>     
>
> It's simply not meant for use by linguists.
>
>   
>> [S]ome of the names used are enough to make linguists cringe.
>>     
>
> True enough.
>
>     A cocky novice once said to Stallman: "I can guess why the editor
>     is called Emacs, but why is the justifier called Bolio?". Stallman
>     replied forcefully, "Names are but names.  'Emack & Bolio's' is the
>     name of a popular ice-cream shop in Boston-town. Neither of these
>     men had anything to do with the software."
>
>     His question answered, yet unanswered, the novice turned to go,
>     but Stallman called to him, "Neither Emack nor Bolio had anything
>     to do with the ice-cream shop, either."
>
> This is generally known as the ice-cream koan.
>
>   

-- 
             **************************************
Anthony Aristar, Director, Institute for Language & Information Technology
  Professor of Linguistics            Moderator, LINGUIST Linguistics Program
Dept. of English                       aristar at linguistlist.org
Eastern Michigan University            2000 Huron River Dr, Suite 104
Ypsilanti, MI 48197
U.S.A.

URL: http://linguistlist.org/aristar/
             **************************************