zh-****-** tags

Peter Constable petercon at microsoft.com
Fri Mar 18 15:39:58 CET 2005

> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Frank Ellermann

> HK and MO belong to CN, it makes no sense to abuse these
> region codes in 3066 tags.

This is not an abuse. The fact that these regions happen to belong to CN
does not mean that HK and MO are not perfectly acceptable region IDs.

>  If there really is something
> like zh-Hant-MO or zh-Hans-HK, then why not use a proper
> name for it, which is not _apparently_ restricted to MO
> or HK ?

The ID zh is ambiguous in its usage. On the one hand, it is used as a
cover ID for distinct Chinese languages, such as Mandarin and Cantonese,
and for those distinctions we have tags such as zh-guoyu and zy-yue. On
the other hand, a large proportion of materials tagged using zh or
derivatives zh-Hant and zh-Hans is specifically Mandarin - although zh
has a broader denotation than Mandarin, it just happens that most
content tagged zh is Mandarin.

For Mandarin, there are differences in usage between these different
regions. Thus it is not unreasonable to use zh-CN, zh-HK, etc. to
indicate these distinctions in regional sub-varieties of Mandarin; and
since there is also an orthogonal distinction between simplified and
traditional characters, tags that combine both region and script IDs are
in order.

> It's also not the same style as
> in en-GB-oed etc., why zh-hanZ-XY instead of zh-XY-hanZ ?

Apparently you have either just joined the list or have not been paying
attention to several threads over the past two or so years. When
selecting content to fit a users request or needs, script distinctions
are almost always going to matter more than regional dialect or spelling
variations. (E.g. I can read either "color" or "colour"; I can't read
that word in Gregg shorthand.) Since many deployed matching processes
use an algorithm involving some form of left-prefix matching, various
qualifiers within the proposed tags are combined left-to-right in order
or priority.

Peter Constable

More information about the Ietf-languages mailing list