Language / Locale identifiers

Fri Dec 10 20:36:04 CET 2010

Mark Davis 🍵 <mark at macchiato dot com> wrote:

> For those of you interested in language and local identifiers
>
> The RFC for Unicode locale identifiers was just released:
>
>    - http://tools.ietf.org/html/rfc6067
>    - See also
>    http://unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers

This is actually a much more important and relevant announcement than
Mark made it out to be.  This is the publication of the very first
extension RFC for BCP 47, as described in Section 3.7 of RFC 5646.  The
new extension, identified by the singleton 'u', allows a wide variety of
locale-related data items to be included in language tags, completely
within the framework of BCP 47.

For example, in an environment where language tags are used as locale
identifiers, it is now fully conformant to create tags like these:

	he-IL-u-ca-hebrew
(Hebrew as used in Israel, using the traditional Hebrew calendar)

	de-u-co-phonebk
(German, collated according to phone book ordering)

	es-u-cu-mxn
(Spanish, using Mexican pesos)

	fa-Arab-u-nu-arabext
(Farsi, written in the Arabic script and with extended Arabic numerals)

	en-u-tz-usden
(English, in the U.S. Mountain time zone)

Of course, many of these extensions would be considered inappropriate
for "normal" language tagging use, but some (like collation and
numerals) can certainly be viewed as "language attributes of content"
and might be appropriate for use outside the CLDR framework.

Developers of software that generates or interprets BCP 47 language tags
may want to add support for extension U, though BCP 47 does not require
them to do so.

See Unicode Technical Standard #35 for more information.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s