Language tags that denote writing systems

Doug Ewell dewell@adelphia.net
Wed, 11 Dec 2002 08:39:05 -0800


Jon Hanna <jon at spin dot ie> wrote:

> That *is* the problem with yi-Latn -> yi! yi-Latn -> yi is correct,
> but there is no way to know that without recording the fact somewhere.
> Since one cannot assume any hierarchical system within the tags (even
> the case where a ISO 3166 and ISO 639 are combined gives us the
> totally separate sgn-IE, sgn-GB, sgn-US etc.) and must treat them as
> opaque identifiers a conversion that may be necessary (same language
> different script, or same language and one isn't written) is going to
> be problematic.

There is a hierarchical system.  Isn't that exactly what Section 2.5,
"Language-range," in RFC 3066 is for?

"A language-range matches a language-tag if it exactly equals the tag,
or if it exactly equals a prefix of the tag such that the first
character following the prefix is '-'."

Applications that wish to equate yi-Latn (or yi-Hebr) with yi can use
this mechanism, instead of requiring the tags being compared to be
identical.

There is a note in Section 2.5 to the effect that language tags thus
matched are not guaranteed to be mutually intelligible.  This applies to
script subtags as well as any other subtags.  For example, the common
"zh-" prefix does not necessarily imply that zh-cn-11 (or zh-guoyo),
Mandarin, is mutually intelligible with zh-cn-31 (or zh-wuu), Wu.
Similarly, the common "yi-" prefix doesn't imply that the target
audience can read yi-Latn and yi-Hebr with equal facility.  (I certainly
can't.)  But the strategy is sound: applications have a standard
mechanism for considering yi-Latn and yi-Hebr equivalent, if they choose
to use it.

-Doug Ewell
 Fullerton, California