Review period; Nepali and Oriya

Gordon P. Hemsley gphemsley at gmail.com
Sun Aug 5 00:34:50 CEST 2012


On Sat, Aug 4, 2012 at 5:32 PM, Doug Ewell <doug at ewellic.org> wrote:
> Gordon P. Hemsley wrote:
>> Given that CLDR tends to map ISO 639 macrolanguage codes onto a single
>> particular microlanguage, maybe we should synchronize with them? I
>> don't know how feasible (or logical) it would be, given that they
>> already have a lot more mappings than there are extlangs in the
>> registry, but it might be something to consider.
>
> I don't know which of the many CLDR tables has this data. Mark Davis can
> probably help here. But the Registry can't simply map "Chinese" onto
> "Mandarin," if that's what you mean; any of the other encompassed languages
> like Cantonese might also be denoted in tags by 'zh'.

Ah, sorry, forgot to include the link:
http://unicode.org/repos/cldr/trunk/common/supplemental/supplementalMetadata.xml

Every <languageAlias reason="macrolanguage" /> maps a microlanguage
code (@type) to a macrolanguage code (@replacement). This seems to be
for the same reason as the existence of extlangs, except that this
eliminates the ability to use macrolanguage codes for their
macrolanguage usage.

For example, 'cmn' cannot be used as Mandarin; instead, 'cmn' is
mapped to 'zh', meaning that "Chinese" usually really means
"Mandarin", so that's now what 'zh' means.

A slightly more detailed explanation is available in UTS #35:
http://www.unicode.org/reports/tr35/#Language_Locale_Field_Definitions

Whether that is the right way or wrong way to do things is up for
debate. But it does have its uses; for example, it's pretty handy if
you have legacy locales like 'zh'—they are assumed to be Mandarin
anyway, so no need to switch to 'cmn'.

>> Were there new macrolanguages that were
>> diliberately NOT registered as extlangs (past the original
>> registration)?
>
> (Point of terminology: extlangs represent encompassed languages, like
> "Cantonese," not the macrolanguages that encompass them, like "Chinese.")
>
> Sure, there have been some. For example, in 2010 ISO 639-3 converted the
> individual language code element 'bnc' for "Central Bontoc" to a
> macrolanguage called "Bontok," encompassing five languages, including a new
> code element 'lbk' for "Central Bontok" (note incidental spelling change).
> The ietf-languages list and Reviewer did not believe that 'bnc' had
> previously been used in BCP 47 contexts to refer specifically to Central
> Bontoc in some cases, and generally to all five Bontok languages in other
> cases, so there was no creation of five new extlangs under 'bnc'.
>
> At the same time, however, it was decided that 'lv' had indeed been used
> both for Latvian specifically and for the general sense of "Latvian" which
> included Latgalian, so a different decision was made there.
>
> The question regarding Nepali and Oriya could be thought of as whether tag
> usage for these languages has been more like Bontok, or more like Latvian.

Well, I have no additional information to offer about that. :)

-- 
Gordon P. Hemsley
me at gphemsley.org
http://gphemsley.org/http://gphemsley.org/blog/


More information about the Ietf-languages mailing list