Chinese "trad": ISO, IANA tags, and ISO CD 15924 (Script Codes)

John Clews
Mon, 11 Feb 2002 21:40:43 GMT

Dear Peter

Perhaps you'd like to respond to this too. I wrote this earlier, but
it also relates to simplified etc. However, I'd also prefer to see
only "[date]" rather than "trad" below, so that comment should modify
some of the text below.

However, it would also be useful to provide an indication of how
other different langauges fit into the general scheme of things. In
particular, do the Chinese language entities with IANA tags also need
to have subtags indicating simplified/traditional script?

This is prompted by the earlier discussion on the suggested
"trad" subtag in relation to German, and whether this, or something
like it, might be usable with other languages. The query below
relates specifically to Chinese, though other languages for which
spelling or orthography reforms are made may also fall into this
discussion too, though I am not expanding this discussion in this
email to include those.

Comments that language codes/tags be provided for simplified Chinese
and traditional Chinese (meaning fonts, in effect) keep coming up
from time to time, with things like zh-CN and zh-TW being broad
"lookalikes" for these, but not exact equivalents.

Suggestions like using zh-cn-1962 (or zh-1962) would be better in
this regard, that more recent emails have noted in this thread.

Everything to do with this simplified/traditional area is a problem,
because it can be difficult to specify what is meant by any one
code/tag combination. However, unless the ICT community has some
settled ideas, tags will inevitably be ambivalent, which is what
nobody wants.

It strikes me that unless some suggestions are listed somewhere for
simplified Chinese and traditional Chinese (meaning fonts, in effect)
people will use different tags with slightly different meanings.

ISO 15924: Codes for representation of names of scripts should
provide for some of these, and Michael Everson is both IANA Language
Tag Reviewer and also (pending the rather long-winded resolution
by ISO of extremely minor administrative issues within ISO) he is
the designated person to act as the registration agency for
ISO 15924: Codes for representation of names of scripts.

I'm sure Michael Everson will be one of the first to respond, and I
would welcome his input here.

I know that I'm fed up with trying to respond to such queries,
(regarding codes or tags for simplified Chinese and traditional
Chinese (meaning fonts, in effect)) and I'm sure that others are too,
so a provisional list of what codes/tags are used for which Chinese
language entities (as listed below) would be extremely helpful.

Any suggestions for adding "trad" and "simplified" Chinese into the
list below (or combinations thereof) would be welcome - and also
saying something about typical scope for each tag, as mentioned at
the end of this text, before the table.

John Clews

PS - as a reminder, I include information on what is usable in
relation to Chinese in IANA registrations. Because I have it
available in the file I took it from, I also include SIL codes too
(from the Ethnologue).

SIL codes are listed below only because the information is available
(and SIL codes enable unique identification of a language, even
through any name changes) - there is no suggestion here that these be
used as code entities or tag entities in relation to RFC 3066: that
would be a separate matter.

Comments on and additions to this list (especially listing script
code comination add-ons too) would be useful.

Also descriptions of scope for individual uses would be handy too.

Distinctions on when to use "zh"  and when "zh-guoyu" would also be

Scope notes would be useful in relation to which might be expected
only to turn up in spoken text, and which should be used in relation
to specifying language attributes of written text.


Usable in relation to RFC 3066:

                   [ISO] [SIL]  [IANA registration and date]

Chinese, Guoyu           [...]  zh-guoyu     21-Mar-00
CHINESE, GAN             [KNN]  zh-gan       21-Mar-00
CHINESE, HAKKA           [HAK]  zh-hakka     03-Apr-01 [1]
CHINESE, MIN BEI         [MNP]  zh-min       21-Mar-00
CHINESE, MIN NAN         [CFR]  zh-min-nan   26-Mar-01 [2] [3]
CHINESE, WU              [WUU]  zh-wuu       21-Mar-00 [4]
CHINESE, XIANG           [HSN]  zh-xiang     21-Mar-00 [5]
CHINESE, YUE             [YUH]  zh-yue       21-Mar-00 [6]


[1] IANA deprecated i-hak on 03-Apr-01
[2] Includes a second tag.
[3] CHINESE, MIN NAN (used in Minnan etc)
[4] CHINESE, WU    (used in Shanghai etc)
[5] CHINESE, XIANG    (used in Human etc)
[6] CHINESE, YUE      (used in Cantonese)

In passing, Chinese language entities not specified in relation to
RFC 3066 include the following:

CHINESE, JINYU           [CJY]

Perhaps some of the above might be expected only to turn up in spoken
text? That comment is based on TV subtitling practices in the PRC.

In addition there is also


Best regards

John Clews

John Clews,
Keytempo Limited (Information Management),
8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432;

Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37/SC2/WG1: Language Codes