Review period; Nepali and Oriya

Mark Davis ☕ mark at macchiato.com
Mon Aug 20 09:06:44 CEST 2012


That is not the interpretation used by CLDR; it is not just a 'best effort'.

This goes back to the origin of the macrolanguage model, which tries to
deal with language "splitting", where it has become clear that two or more
different languages are being represented in practice by the same code.
There are basically two different models for handling this:

   - *The German model. *Up until 2006, Swiss German was tagged with "de"
   (German). After 'gsw' was encoded specifically for it, content for Swiss
   German would be tagged with 'gsw'. After that point, content tagged with
   'de' would be presumed not to include Swiss German.
   - *The Chinese model. *Up until 2009, Cantonese was tagged with "zh"
   (Chinese)*. After 'yue' was encoded specifically for it, content for
   Cantonese would be tagged with 'yue'. After that point, content tagged with
   'zh' might or might not include Cantonese.

The German model suffers from having to migrate any important minority
content that was tagged with the code. So for specialized usage (eg
bibliographic) it might not be as good. But for IT usage it is much better:
it maintains backward compatibility for the vast majority use of the
language codes ('de', 'zh', 'ar',...).

Thus CLDR uses the German model for all languages. For example, content
tagged with 'zh' is presumed not to include Cantonese; in other words, 'zh'
equals 'cmn' when using CLDR. To do that, it replaces each principal
encompassed language ('cmn') by its corresponding macrolanguage ('zh'). One
could, of course, reverse this by processing all CLDR data in a given
implementation's use of it, and map 'zh' to 'cmn'.

Now, that being said, if this group wants to have Nepali and Oriya be macro
languages, it is not really a problem for CLDR; simply more entries in the
tables. It will cause migration hassles for other implementations that use
BCP47, but that is not an issue with CLDR. The more common the language,
the worse the hassles. For example, consider what would happen were ISO to
decide that 'en' really was a macrolanguage with 'ens' being Standard
English, and 'enz' being New Zealand English—how much software would
hiccough when it hit 'enz-GB'...

Mark <https://plus.google.com/114199149796022210033>

* or the (non-canonical) language tag zh-cmn introduced in 2005
*
*
*— Il meglio è l’inimico del bene —*
**



On Sun, Aug 5, 2012 at 1:15 AM, Doug Ewell <doug at ewellic.org> wrote:

> This works well or not so well, depending on the meaning of "usually" and
> on whether one happens to be dealing with the "usually" case. Locations and
> circumstances exist in which "Chinese" definitely does not mean "Mandarin."
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20120820/56d89da7/attachment.html>


More information about the Ietf-languages mailing list