zh-CN etc

Peter Constable petercon at microsoft.com
Fri Jun 28 18:54:09 CEST 2013

ISO 639 has never given recommendations involving the specific syntax ll-CC; they have merely suggested the use of an ISO 639 identifier in conjunction with ISO 3166 identifiers – “ll cc”.

RFC 1766 allowed for ISO 639-1 alpha-2 plus ISO 3166-1 alpha-2 combinations. RFC 3066 added use of ISO 639-2 alpha-3. The big change between zh-CN and zh-Hans _sanctioned within BCP 47_ came with the explicit addition of script subtags in RFC 4646. However, registration of tags with script subtags was permitted prior to RFC 4646. In fact, zh-Hans was registered in the RFC 3066 time frame. It could not have been registered in the initial RFC 1766 time frame since ISO 15924 did not yet exist (it was published in the final stages of preparation of RFC 3066).

So, “zh-Hans” would not have been used in the time frame in which RFC 1766 was the current revision of BCP 47. On that basis, if you wanted to refer to the other identifier type as RFC1766, that wouldn’t seem to me to be unreasonable.


From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Misha.Wolf at thomsonreuters.com
Sent: June 13, 2013 8:14 AM
To: addison at lab126.com; ietf-languages at iana.org
Subject: RE: zh-CN etc

Hi Addison,

The issue is the distinction between the instructions:
1  use this Identifier to indicate that language
2  interpret this Identifier as indicating that language

Our BCP47 Identifier Type is used to determine the Identifier to be used for 1 above.  If we allowed a language to have two Identifiers of Type BCP47, then systems wouldn’t know which to use for case 1.

The reason I was thinking of ISO639-1 is that it does (or used to -- I no longer have a copy of the standard) explicitly recommend the use of ISO 3166-1 Country codes, to form Language tags of the form ll-CC.

The reason I was thinking of RFC1766 is that it based itself on ISO 639-1 and so inherited this, as follows:

   In the first subtag:

    -    All 2-letter codes are interpreted as ISO 3166 alpha-2
         country codes denoting the area in which the language is

    -    Codes of 3 to 8 letters may be registered with the IANA by
         anyone who feels a need for it, according to the rules in
         chapter 5 of this document.

   The information in the subtag may for instance be:

    -    Country identification, such as en-US (this usage is
         described in ISO 639)

But your suggestion for using BCP47, coupled with something like ALT, ALIAS or LEGACY strikes me as the best way forward ☺.


From: Phillips, Addison [mailto:addison at lab126.com]
Sent: 13 June 2013 15:43
To: Wolf, Misha (TR Technology); ietf-languages at iana.org<mailto:ietf-languages at iana.org>
Subject: RE: zh-CN etc

Hi Misha,

“zh-CH” is a valid BCP 47 language tag. Users “hanging on to” the use of the region subtag as indicating script is not an uncommon problem with Chinese tagging, as obviously you’re aware. CLDR specifies an “add likely subtags” algorithm that transforms “zh-CN” to “zh-Hans-CN” as an aid to matching (see [1]), which a number of us have implemented because of existing legacy language and/or locale tags.

Regarding what to call the language+region tag, ISO639-1 seems an unlikely choice, as the “-CN” part isn’t 639-ish and since potentially a 639-2 or 639-3 subtag could appear there. RFC1766 would work, although that isn’t the most recent pre-current RFC. That would be RFC3066.

In any case, I don’t think it’s a good idea to encourage the thought that there is “another” language tagging RFC out there, though. Perhaps something like “ALT-BCP47” or “BCP47-ALIAS” or “BCP47-LEGACY”?

Best Regards,


Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

[1] http://www.unicode.org/reports/tr35/#Likely_Subtags

From: ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no> [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Misha.Wolf at thomsonreuters.com<mailto:Misha.Wolf at thomsonreuters.com>
Sent: Thursday, June 13, 2013 6:56 AM
To: ietf-languages at iana.org<mailto:ietf-languages at iana.org>
Subject: zh-CN etc

Hi folks,

I manage various in-house taxonomies and would appreciate your advice.  For each “object” we have our own numeric PermID plus (optionally) one or more Identifiers of Identifier Types suited to the Object Type.  For languages, we have an Identifier Type “BCP47”.  In the case of Simplified Chinese, the corresponding Identifier contains the value “zh-Hans”.  We’ve been approached by a group which wants us to associate the value “zh-CN” with Simplified Chinese.  We’ve refused to do this using the Identifier Type “BCP47” but have offered to create another Identifier Type to hold this value and similar values which may, from time to time, be needed.  I don’t know what to call this Identifier Type.  I’ve considered both “ISO639-1” and “RFC1766”.  Your advice would be appreciated.



This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://thomsonreuters.com/prof_disclosures/>

This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.

This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20130628/31fd1ceb/attachment-0001.html>

More information about the Ietf-languages mailing list