New item in ISO 639-2 - Zaza
Doug Ewell
dewell at adelphia.net
Thu Aug 24 16:10:32 CEST 2006
+1 in toto. Brilliant.
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
----- Original Message -----
From: "John Cowan" <cowan at ccil.org>
To: "Mark Davis" <mark.davis at icu-project.org>
Cc: "Addison Phillips" <addison at yahoo-inc.com>;
<ietf-languages at iana.org>; "Doug Ewell" <dewell at adelphia.net>
Sent: Thursday, August 24, 2006 6:48
Subject: Re: New item in ISO 639-2 - Zaza
> Mark Davis scripsit:
>
>> 3. Allow both 1 and 2 as synonyms.
>> - zh-cmn-CN and cmn-CN are both valid and synonymous.
>
> This is completely contrary to the spirit of RFC 3066. We have tag
> synonymy only in a few cases, and only where various RAs and MAs have
> made mistakes. You are talking about providing 350 cases of
> gratuitous synonymy.
>
>> - - means adding more structure (which we have allowed for).
>> - ± automatic fallback (if we canonicalize to the longer form)
>
> Canonicalization is not part of 3066bis, just a recommendation.
>
>> - - testing for validity is slightly more complicated (need to
>> check that the combination of lang + extlang for the long form
>> is
>> valid)
>
> Lang+extlang checking is just prefix checking, which is already being
> done for variants.
>
>> A big question in my mind is the stability of the macro language
>> inclusion relationship. If there is the remotest chance that they
>> will change, eg that someday de becomes a macro language that
>> includes de, sli, sxu, ltz, vmf, etc.
>> (http://www.ethnologue.com/show_family.asp?subid=90073) then the only
>> choice we have is #1.
>
> I agree with the hypothetical -- but I think it will remain purely
> hypothetical, for this reason:
>
> Macrolanguages exist as a shim between the moderate lumper tendencies
> of 639-2 and the extreme splitter tendencies of 693-3. Wherever 639-2
> has lumped language varieties that 639-3 considers distinct languages,
> a macrolanguage is created. The chance that such a well-used code as
> "de" will be redefined away from meaning "Standard German" is
> effectively nil. And 639-3/RA isn't going to gratuitously create
> macrolanguages otherwise -- they are a wart on the standard.
>
> A very thorough multi-year analysis has caught all such cases, and we
> can be confident that as of when 639-3/RA joins the RA/JAC there
> should be no more lurking undiscovered.
>
> We then have to deal with two kinds of retroactive creation of
> macrolanguages: when 639-2/RA registers a lumped language, as they
> have just done, and when 639-3/RA decides on the basis of new evidence
> to split one of their existing languages. I would hope that cases of
> the first kind will cease, but cases of the second kind are always a
> possibility when dealing with little-known languages -- the Ethnologue
> pages are full of notes like "XXX dialect may be a separate language".
> Luckily, handling them is easy: the existing language subtag remains
> in place, and we add two new 639-3-specified extlang subtags, one for
> the newly recognized language and one to cover the mainstream
> dialects.
>
> When we do get a case of the first kind, under option #1 we must
> decide either case by case or once and for all what to do: add the
> deprecate the new subtag (as we do with changes to country codes, but
> without a specific replacement), or deprecate the existing language
> subtags and introduce corresponding extlang tags under the new tag.
> Under option #2 we don't have to do anything special -- but we risk
> substantial user confusion.
>
>> The more I think about it, the more I like #1. We already have to do
>> fallback between language subtags (think no, nb, nn), and this
>> recasts the issue into providing additional data so that if I don't
>> find language subtag X, I can what is the next best choice Y.
>
> And I still strongly favor #2. The last thing we want is a situation
> where most people continue to use "zh" to tag Mandarin Chinese
> documents (the overwhelming majority of all Chinese documents) and
> some start to use "cmn". This isn't a trivial case like the Norwegian
> one; there are 350 subtags we are talking about here. We would in
> effect have to introduce a major revision to the matching draft in
> order to make these remappings part of it -- something I at least had
> very much hoped to avoid.
>
> No, let most people write "zh", let those who care write "zh-cmn" (as
> they can already do, thanks to a grandfathered RFC 3066 tag), which
> will fall back to "zh", and let people who use the existing tags
> "zh-gan", "zh-wuu", and "zh-yue" continue to have the right results,
> but now as part of the standard rather than as a grandfathered
> exception. (Some grandfathered tags will have to be deprecated.)
>
> Overall, though, #2 is the conservative choice both in fallback
> behavior and for existing language tags.
>
> --
> They do not preach John Cowan
> that their God will rouse them cowan at ccil.org
> A little before the nuts work loose.
> http://www.ccil.org/~cowan
> They do not teach
> that His Pity allows them --Rudyard Kipling,
> to drop their job when they damn-well choose. "The Sons of
> Martha"
>
More information about the Ietf-languages
mailing list