New item in ISO 639-2 - Zaza

Thu Aug 24 16:10:32 CEST 2006

+1 in toto.  Brilliant.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/

----- Original Message ----- 
From: "John Cowan" <cowan at ccil.org>
To: "Mark Davis" <mark.davis at icu-project.org>
Cc: "Addison Phillips" <addison at yahoo-inc.com>; 
<ietf-languages at iana.org>; "Doug Ewell" <dewell at adelphia.net>
Sent: Thursday, August 24, 2006 6:48
Subject: Re: New item in ISO 639-2 - Zaza


> Mark Davis scripsit:
>
>>   3. Allow both 1 and 2 as synonyms.
>>      - zh-cmn-CN and cmn-CN are both valid and synonymous.
>
> This is completely contrary to the spirit of RFC 3066.  We have tag 
> synonymy only in a few cases, and only where various RAs and MAs have 
> made mistakes.  You are talking about providing 350 cases of 
> gratuitous synonymy.
>
>>      - - means adding more structure (which we have allowed for).
>>      - ± automatic fallback (if we canonicalize to the longer form)
>
> Canonicalization is not part of 3066bis, just a recommendation.
>
>>      - - testing for validity is slightly more complicated (need to
>>      check that the combination of lang + extlang for the long form 
>> is
>>      valid)
>
> Lang+extlang checking is just prefix checking, which is already being 
> done for variants.
>
>> A big question in my mind is the stability of the macro language 
>> inclusion relationship. If there is the remotest chance that they 
>> will change, eg that someday de becomes a macro language that 
>> includes de, sli, sxu, ltz, vmf, etc. 
>> (http://www.ethnologue.com/show_family.asp?subid=90073) then the only 
>> choice we have is #1.
>
> I agree with the hypothetical -- but I think it will remain purely 
> hypothetical, for this reason:
>
> Macrolanguages exist as a shim between the moderate lumper tendencies 
> of 639-2 and the extreme splitter tendencies of 693-3.  Wherever 639-2 
> has lumped language varieties that 639-3 considers distinct languages, 
> a macrolanguage is created.  The chance that such a well-used code as 
> "de" will be redefined away from meaning "Standard German" is 
> effectively nil. And 639-3/RA isn't going to gratuitously create 
> macrolanguages otherwise -- they are a wart on the standard.
>
> A very thorough multi-year analysis has caught all such cases, and we 
> can be confident that as of when 639-3/RA joins the RA/JAC there 
> should be no more lurking undiscovered.
>
> We then have to deal with two kinds of retroactive creation of 
> macrolanguages:  when 639-2/RA registers a lumped language, as they 
> have just done, and when 639-3/RA decides on the basis of new evidence 
> to split one of their existing languages.  I would hope that cases of 
> the first kind will cease, but cases of the second kind are always a 
> possibility when dealing with little-known languages -- the Ethnologue 
> pages are full of notes like "XXX dialect may be a separate language". 
> Luckily, handling them is easy: the existing language subtag remains 
> in place, and we add two new 639-3-specified extlang subtags, one for 
> the newly recognized language and one to cover the mainstream 
> dialects.
>
> When we do get a case of the first kind, under option #1 we must 
> decide either case by case or once and for all what to do: add the 
> deprecate the new subtag (as we do with changes to country codes, but 
> without a specific replacement), or deprecate the existing language 
> subtags and introduce corresponding extlang tags under the new tag. 
> Under option #2 we don't have to do anything special -- but we risk 
> substantial user confusion.
>
>> The more I think about it, the more I like #1. We already have to do 
>> fallback between language subtags (think no, nb, nn), and this 
>> recasts the issue into providing additional data so that if I don't 
>> find language subtag X, I can what is the next best choice Y.
>
> And I still strongly favor #2.  The last thing we want is a situation 
> where most people continue to use "zh" to tag Mandarin Chinese 
> documents (the overwhelming majority of all Chinese documents) and 
> some start to use "cmn".  This isn't a trivial case like the Norwegian 
> one; there are 350 subtags we are talking about here.  We would in 
> effect have to introduce a major revision to the matching draft in 
> order to make these remappings part of it -- something I at least had 
> very much hoped to avoid.
>
> No, let most people write "zh", let those who care write "zh-cmn" (as 
> they can already do, thanks to a grandfathered RFC 3066 tag), which 
> will fall back to "zh", and let people who use the existing tags 
> "zh-gan", "zh-wuu", and "zh-yue" continue to have the right results, 
> but now as part of the standard rather than as a grandfathered 
> exception. (Some grandfathered tags will have to be deprecated.)
>
> Overall, though, #2 is the conservative choice both in fallback 
> behavior and for existing language tags.
>
> -- 
> They do not preach                              John Cowan
>  that their God will rouse them                cowan at ccil.org
>    A little before the nuts work loose. 
> http://www.ccil.org/~cowan
> They do not teach
>  that His Pity allows them                         --Rudyard Kipling,
>    to drop their job when they damn-well choose.   "The Sons of 
> Martha"
>