zh-CN etc

Phillips, Addison addison at lab126.com
Thu Jun 13 16:42:48 CEST 2013

Hi Misha,

“zh-CH” is a valid BCP 47 language tag. Users “hanging on to” the use of the region subtag as indicating script is not an uncommon problem with Chinese tagging, as obviously you’re aware. CLDR specifies an “add likely subtags” algorithm that transforms “zh-CN” to “zh-Hans-CN” as an aid to matching (see [1]), which a number of us have implemented because of existing legacy language and/or locale tags.

Regarding what to call the language+region tag, ISO639-1 seems an unlikely choice, as the “-CN” part isn’t 639-ish and since potentially a 639-2 or 639-3 subtag could appear there. RFC1766 would work, although that isn’t the most recent pre-current RFC. That would be RFC3066.

In any case, I don’t think it’s a good idea to encourage the thought that there is “another” language tagging RFC out there, though. Perhaps something like “ALT-BCP47” or “BCP47-ALIAS” or “BCP47-LEGACY”?

Best Regards,


Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

[1] http://www.unicode.org/reports/tr35/#Likely_Subtags

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Misha.Wolf at thomsonreuters.com
Sent: Thursday, June 13, 2013 6:56 AM
To: ietf-languages at iana.org
Subject: zh-CN etc

Hi folks,

I manage various in-house taxonomies and would appreciate your advice.  For each “object” we have our own numeric PermID plus (optionally) one or more Identifiers of Identifier Types suited to the Object Type.  For languages, we have an Identifier Type “BCP47”.  In the case of Simplified Chinese, the corresponding Identifier contains the value “zh-Hans”.  We’ve been approached by a group which wants us to associate the value “zh-CN” with Simplified Chinese.  We’ve refused to do this using the Identifier Type “BCP47” but have offered to create another Identifier Type to hold this value and similar values which may, from time to time, be needed.  I don’t know what to call this Identifier Type.  I’ve considered both “ISO639-1” and “RFC1766”.  Your advice would be appreciated.



This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://thomsonreuters.com/prof_disclosures/>

This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20130613/cb4acc08/attachment.html>

More information about the Ietf-languages mailing list