Wikipedia tagging

Mark Davis ☕ mark at
Mon Jul 29 20:56:27 CEST 2013

In particular, in CLDR (and ICU) we use the 'shortest form' as the
canonical form for language tagging. So as Markus says, we tag content for
"en-Latn-US" with "en", and "zh-Hans-CN" (or "cmn-Hans-CN") as simply "zh".

One clarification. When looking up resources, we have a matching process
that takes list of desired user languages, and finds the best match in the
list of supported languages. So if the user's desired languages are "ja-JP,
zh-CN", and the supported languages are "zh, en", then the match would be

Mark <>
*— Il meglio è l’inimico del bene —*

On Mon, Jul 29, 2013 at 1:03 AM, Markus Scherer < at>wrote:

> On Sun, Jul 28, 2013 at 2:00 PM, Peter Constable <petercon at>wrote:
>>  Can anyone explain why it is that Wikipedia is tagging Simplified
>> Chinese content as simply “zh”? ****
>> **[...]**
>> That’s not good practice and will lead to users encountering buggy
>> behaviour (e.g., content being displayed using the wrong fonts).
> I think it is very common that developers assume that "zh" defaults to
> "zh-Hans-CN", just like they assume that "en" defaults to "en-Latn-US",
> "de" to "de-Latn-DE", etc.
> markus
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Ietf-languages mailing list