New item in ISO 639-2 - Zaza

Doug Ewell dewell at adelphia.net
Thu Aug 24 09:06:32 CEST 2006


Mark Davis wrote:

> This is still an open issue. John discussed the choices in 
> http://www.alvestrand.no/pipermail/ietf-languages/2004-August/002214.html

Thanks for reminding us all of John's excellent contribution to an 
excellent thread.

> To add examples and some pros & cons to his list:
>
> 1.  Allow the individual language codes
> - cmn-CN is valid
> - zh-cmn-CN is not
> - + Simplest approach, no new structure.
> - - fallback requires extra information.

The fallback problem has been considered paramount to this point, and 
was the reason for inventing extended language subtags in the first 
place.  The current (draft) 639-3 table includes more than 350 language 
codes that belong to one macrolanguage or another.  That would be a 
hefty table for everyone to carry around just to do basic matching.

> 2.  Allow the higher-order code extended by an individual language 
> code
> - cmn-CN is not valid
> - zh-cmn-CN is valid
> - - means adding more structure (which we have allowed for).
> - + automatic fallback
> - - testing for validity is slightly more complicated (need to check 
> that the combination of lang + extlang is valid)

As you indicated, the first minus really isn't a minus since we've 
already done the work of defining extlangs in the ABNF.  And since 
validity checking requires the entire Registry anyway, complete with 
Preferred-Script checking and prefix checking on variants, adding prefix 
checking on extlangs is only *very* slightly more complicated.

> 3.  Allow both 1 and 2 as synonyms.
> - zh-cmn-CN and cmn-CN are both valid and synonymous.
> - - means adding more structure (which we have allowed for).
> - ± automatic fallback (if we canonicalize to the longer form)
> - - testing for validity is slightly more complicated (need to check 
> that the combination of lang + extlang for the long form is valid)

I don't see any advantages of #3 over #2, either on your scoreboard or 
in my mind.

> A big question in my mind is the stability of the macro language 
> inclusion relationship. If there is the remotest chance that they will 
> change, eg that someday de becomes a macro language that includes de, 
> sli, sxu, ltz,

lb

> vmf, etc. ( http://www.ethnologue.com/show_family.asp?subid=90073) 
> then the only choice we have is #1.

That's a very good question, especially since -- despite the presence of 
a draft 639-3 table that considers Dimli and Kirmanjki to be separate 
languages -- the ISO 639 RAs-JAC has decided to combine them into one 
code element.  It's no different from the German example.

> The more I think about it, the more I like #1. We already have to do 
> fallback between language subtags (think no, nb, nn), and this recasts 
> the issue into providing additional data so that if I don't find 
> language subtag X, I can what is the next best choice Y.

I still prefer #2 over #1, but this is a discussion for LTRU (after the 
charter is approved and we can begin in earnest) and not here.  I will 
say that using no/nb/nn as an exemplar case for fallback does not feel 
right; it is not indicated anywhere in the Registry and is mentioned 
only as a hypothetical example in the matching draft.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
Editor, draft-ietf-ltru-initial




More information about the Ietf-languages mailing list