Language / Locale identifiers
petercon at microsoft.com
Sun Dec 12 19:28:32 CET 2010
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
>> From the perspective of BCP 47, such tags are appropriately described
>> as "language tags" with an extension not interpretable in terms of BCP
>> 47. In other words, a language tag with some extra black-box stuff.
>> But in terms of the extension, such tags are not "language tags" but
>> rather are "locale identifiers".
> Section 3.7 of 5646 says extension subtags "are reserved for the
> generation of identifiers that contain a language component and are
> compatible with applications that understand language tags."
What I said above is consistent with that.
> A tag like "he-IL-u-ca-hebrew" is indeed a BCP 47-conformant identifier,
> functionally equivalent to a language tag,
Not just functionally equivalent to, but indeed itself a veritable BCP 47 language tag, just one with a non-language extension.
>>> For example, in an environment where language tags are used as locale
>> That should be, "in an environment in which locale tags are used"
> If these were not intended to be valid language tags, there would have
> been no point in creating RFC 6067. Plus, as I said, at least a few of the
> extension keys do represent language-identification data.
I didn't say that they weren't intended to be valid language tags. I was only saying that in environments in which the extension is understood, these are not considered language tags but rather locale IDs.
>> And this characterization of the meaning
>>> (Hebrew as used in Israel, using the traditional Hebrew calendar)
>> Is also not quite right: "he-IL-u-ca-hebrew" denotes the _locale_
>> 'Hebrew-Israel with Hebrew calendar'. One can infer the language
>> 'Hebrew as used in Israel' from that, but in the context in which the
>> extension is interpretable this is not a _language tag_ but rather a
>> _locale identifier_.
> I concede this relatively minor distinction. Calendar information is
> clearly about locales and not languages. Nevertheless, the tag itself
> —which is obviously intended to identify a locale or locale setting —
> is now syntactically valid as a language tag, in a way that was not true
> a week ago.
>> A detail regarding the Unicode language and locale identifiers worth
>> pointing out is that not all valid BCP 47 language tags are valid
>> Unicode language IDs, and there is a special-case ID that is permitted
>> that is not a valid BCP 47 language tag. The syntax is:
>> / unicode_language_subtag
>> [sep unicode_script_subtag]
>> [sep unicode_region_subtag]
>> *(sep unicode_variant_subtag)
>> where sep is "-" and unicode_language_X is any LSTR subtag of type X.
> This can't be the most current spec, since it doesn't provide for extension
> U at all.
> The latest version of the spec (1.9)...
This is version 1.9 of that spec. Note that it defines both Unicode_language_id and also Unicode_locale_id. The latter includes the U extension:
The former does not.
> I don't recommend that people start tagging their Web pages
> with time zone or collation-strength information.
More information about the Ietf-languages