New item in ISO 639-2 - Zaza
Doug Ewell
dewell at adelphia.net
Thu Aug 24 09:06:32 CEST 2006
Mark Davis wrote:
> This is still an open issue. John discussed the choices in
> http://www.alvestrand.no/pipermail/ietf-languages/2004-August/002214.html
Thanks for reminding us all of John's excellent contribution to an
excellent thread.
> To add examples and some pros & cons to his list:
>
> 1. Allow the individual language codes
> - cmn-CN is valid
> - zh-cmn-CN is not
> - + Simplest approach, no new structure.
> - - fallback requires extra information.
The fallback problem has been considered paramount to this point, and
was the reason for inventing extended language subtags in the first
place. The current (draft) 639-3 table includes more than 350 language
codes that belong to one macrolanguage or another. That would be a
hefty table for everyone to carry around just to do basic matching.
> 2. Allow the higher-order code extended by an individual language
> code
> - cmn-CN is not valid
> - zh-cmn-CN is valid
> - - means adding more structure (which we have allowed for).
> - + automatic fallback
> - - testing for validity is slightly more complicated (need to check
> that the combination of lang + extlang is valid)
As you indicated, the first minus really isn't a minus since we've
already done the work of defining extlangs in the ABNF. And since
validity checking requires the entire Registry anyway, complete with
Preferred-Script checking and prefix checking on variants, adding prefix
checking on extlangs is only *very* slightly more complicated.
> 3. Allow both 1 and 2 as synonyms.
> - zh-cmn-CN and cmn-CN are both valid and synonymous.
> - - means adding more structure (which we have allowed for).
> - ± automatic fallback (if we canonicalize to the longer form)
> - - testing for validity is slightly more complicated (need to check
> that the combination of lang + extlang for the long form is valid)
I don't see any advantages of #3 over #2, either on your scoreboard or
in my mind.
> A big question in my mind is the stability of the macro language
> inclusion relationship. If there is the remotest chance that they will
> change, eg that someday de becomes a macro language that includes de,
> sli, sxu, ltz,
lb
> vmf, etc. ( http://www.ethnologue.com/show_family.asp?subid=90073)
> then the only choice we have is #1.
That's a very good question, especially since -- despite the presence of
a draft 639-3 table that considers Dimli and Kirmanjki to be separate
languages -- the ISO 639 RAs-JAC has decided to combine them into one
code element. It's no different from the German example.
> The more I think about it, the more I like #1. We already have to do
> fallback between language subtags (think no, nb, nn), and this recasts
> the issue into providing additional data so that if I don't find
> language subtag X, I can what is the next best choice Y.
I still prefer #2 over #1, but this is a discussion for LTRU (after the
charter is approved and we can begin in earnest) and not here. I will
say that using no/nb/nn as an exemplar case for fallback does not feel
right; it is not indicated anywhere in the Registry and is mentioned
only as a hypothetical example in the matching draft.
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
Editor, draft-ietf-ltru-initial
More information about the Ietf-languages
mailing list