Language / Locale identifiers

Sat Dec 11 20:21:15 CET 2010

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell

> This is actually a much more important and relevant announcement 
> than Mark made it out to be.  This is the publication of the very first 
> extension RFC for BCP 47, as described in Section 3.7 of RFC 5646.  

Indeed, that is noteworthy. 

> The new extension, identified by the singleton 'u', allows a wide 
> variety of locale-related data items to be included in language tags, 
> completely within the framework of BCP 47.

From the perspective of BCP 47, such tags are appropriately described as "language tags" with an extension not interpretable in terms of BCP 47. In other words, a language tag with some extra black-box stuff. But in terms of the extension, such tags are not "language tags" but rather are "locale identifiers".

So...

> For example, in an environment where language tags are used as locale 
> identifiers, 

That should be, "in an environment in which locale tags are used"

And this characterization of the meaning

>	he-IL-u-ca-hebrew
> (Hebrew as used in Israel, using the traditional Hebrew calendar)

Is also not quite right: "he-IL-u-ca-hebrew" denotes the _locale_ 'Hebrew-Israel with Hebrew calendar'. One can infer the language 'Hebrew as used in Israel' from that, but in the context in which the extension is interpretable this is not a _language tag_ but rather a _locale identifier_.

A detail regarding the Unicode language and locale identifiers worth pointing out is that not all valid BCP 47 language tags are valid Unicode language IDs, and there is a special-case ID that is permitted that is not a valid BCP 47 language tag. The syntax is:

="root" 
/ unicode_language_subtag 
  [sep unicode_script_subtag] 
  [sep unicode_region_subtag]
  *(sep unicode_variant_subtag)

where sep is "-" and unicode_language_X is any LSTR subtag of type X.

Peter