New item in ISO 639-2 - Zaza
John Cowan
cowan at ccil.org
Thu Aug 24 15:48:48 CEST 2006
Mark Davis scripsit:
> 3. Allow both 1 and 2 as synonyms.
> - zh-cmn-CN and cmn-CN are both valid and synonymous.
This is completely contrary to the spirit of RFC 3066. We have
tag synonymy only in a few cases, and only where various RAs and MAs
have made mistakes. You are talking about providing 350 cases of
gratuitous synonymy.
> - - means adding more structure (which we have allowed for).
> - ± automatic fallback (if we canonicalize to the longer form)
Canonicalization is not part of 3066bis, just a recommendation.
> - - testing for validity is slightly more complicated (need to
> check that the combination of lang + extlang for the long form is
> valid)
Lang+extlang checking is just prefix checking, which is already being
done for variants.
> A big question in my mind is the stability of the macro
> language inclusion relationship. If there is the remotest
> chance that they will change, eg that someday de becomes
> a macro language that includes de, sli, sxu, ltz, vmf,
> etc. (http://www.ethnologue.com/show_family.asp?subid=90073)
> then the only choice we have is #1.
I agree with the hypothetical -- but I think it will remain purely
hypothetical, for this reason:
Macrolanguages exist as a shim between the moderate lumper tendencies
of 639-2 and the extreme splitter tendencies of 693-3. Wherever 639-2
has lumped language varieties that 639-3 considers distinct languages,
a macrolanguage is created. The chance that such a well-used code as "de"
will be redefined away from meaning "Standard German" is effectively nil.
And 639-3/RA isn't going to gratuitously create macrolanguages otherwise
-- they are a wart on the standard.
A very thorough multi-year analysis has caught all such cases, and we
can be confident that as of when 639-3/RA joins the RA/JAC there should
be no more lurking undiscovered.
We then have to deal with two kinds of retroactive creation of
macrolanguages: when 639-2/RA registers a lumped language, as they
have just done, and when 639-3/RA decides on the basis of new evidence
to split one of their existing languages. I would hope that cases of
the first kind will cease, but cases of the second kind are always a
possibility when dealing with little-known languages -- the Ethnologue
pages are full of notes like "XXX dialect may be a separate language".
Luckily, handling them is easy: the existing language subtag remains in
place, and we add two new 639-3-specified extlang subtags, one for the
newly recognized language and one to cover the mainstream dialects.
When we do get a case of the first kind, under option #1 we must decide
either case by case or once and for all what to do: add the deprecate the
new subtag (as we do with changes to country codes, but without a specific
replacement), or deprecate the existing language subtags and introduce
corresponding extlang tags under the new tag. Under option #2 we don't
have to do anything special -- but we risk substantial user confusion.
> The more I think about it, the more I like #1. We already have to
> do fallback between language subtags (think no, nb, nn), and this
> recasts the issue into providing additional data so that if I don't
> find language subtag X, I can what is the next best choice Y.
And I still strongly favor #2. The last thing we want is a situation
where most people continue to use "zh" to tag Mandarin Chinese documents
(the overwhelming majority of all Chinese documents) and some start to use
"cmn". This isn't a trivial case like the Norwegian one; there are 350
subtags we are talking about here. We would in effect have to introduce
a major revision to the matching draft in order to make these remappings
part of it -- something I at least had very much hoped to avoid.
No, let most people write "zh", let those who care write "zh-cmn"
(as they can already do, thanks to a grandfathered RFC 3066 tag),
which will fall back to "zh", and let people who use the existing tags
"zh-gan", "zh-wuu", and "zh-yue" continue to have the right results,
but now as part of the standard rather than as a grandfathered exception.
(Some grandfathered tags will have to be deprecated.)
Overall, though, #2 is the conservative choice both in fallback behavior
and for existing language tags.
--
They do not preach John Cowan
that their God will rouse them cowan at ccil.org
A little before the nuts work loose. http://www.ccil.org/~cowan
They do not teach
that His Pity allows them --Rudyard Kipling,
to drop their job when they damn-well choose. "The Sons of Martha"
More information about the Ietf-languages
mailing list