LANGUAGE SUBTAG REGISTRATION FORM: pinyin
doug at ewellic.org
Sun Aug 3 20:02:10 CEST 2008
Kent Karlsson <kent dot karlsson14 at comhem dot se> wrote:
> I think it is a bad idea to have this variant for the macrolanguage
> code 'zh'. 'cmn-Latn-pinyin', 'yue-Latn-pinyin' (4646bis), ok. But not
> 'zh-Latn-pinyin' as the latter is ambiguous (unless one sees 'zh' as
> an alias for 'cmn'...).
There has been a long and contentious discussion on LTRU about this
whole macrolanguage thing. Just about everyone agrees that 'zh' should
not be perceived as a 1-to-1 alias for 'cmn', but just about everyone
also agrees that, for various reasons, most content tagged 'zh' is in
fact Mandarin, and so a de-facto assumption may be drawn by some people.
LTRU came up with a very delicately crafted solution regarding
macrolanguages, after a lot of effort and as a result of different
viewpoints and perspectives heard over a long period of time, and I
don't think any other solution could have been reached without at least
as much controversy and pain.
"zh-Latn-pinyin" or even "zh-pinyin" would indeed be ambiguous in that
they don't specify which variety of "Chinese" is being written in
Pinyin. For that matter, "zh" or "zh-Hant" or "zh-CN" are also
ambiguous in the same way. We can expect a good deal of this ambiguity
in the future, and should not expect that everyone will immediately
retag all their "zh" content as "cmn" or "yue".
>> It's not as though we were talking about using 'boont' with Russian.
"ru-boont" would be impossible by definition. "zh-(Latn-)pinyin" may be
underspecified, but certainly is not impossible.
Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:
> ISO 639-3 claims that 'zho' is an alias for 'zh', not 'cmn'. IMO a
> reason why adding ISO 639-3 to the IANA subtag registry without
> stating what is what could create a huge mess.
I think quite a bit of the text of draft-4646bis has been devoted to
"stating what is what." It is not simply left up to interpretation.
The danger, as always (and unavoidably), comes from people deciding to
use the Registry itself for operational guidance without bothering to
read the RFC that provides all these explanations.
Kent replied to Frank:
> You missed my (rhetorical) point completely. cmn is the code for one
> of the languages (Mandarin) *encompassed* by the macrolanguage code(s)
> zh/zho (Chinese). zh-Latn-pinyin is in the (current) proposed
> registration form limited to Mandarin, while 'zh' is not limited to
> Mandarin. And Pinyin for Cantonese (yue, also encompassed by zh/zho)
> differs in both language and romanization rules.
and Randy Presuhn <randy underscore presuhn at mindspring dot com>
> This is *NOT* the same orthography. It's not even terribly similar -
> it's about like saying that English and Swedish share a common
> orthography. The Pinyin in the registration request is optimized to
> the phonology of Mandarin. It only provides for four tones (+
> neutral), and is consequently ill-suited to languages like Cantonese
> that have more elaborate tone systems.
It depends on what exactly is meant by "Pinyin" in the registration
request. Back to the English/Swedish metaphor: the orthographies used
by these two languages aren't 100% equivalent, but they aren't 100%
different either: 'b' still means [b].
If Mark's intent is to represent all Pinyin-like romanizations, that is
one thing, and if his intent is to specify only the Hanyu Pinyin defined
for Mandarin, that is another. In the latter case, and only in that
case, would Gerard Meijssen's characterization of "homonyms" be
accurate. And in that case, I would withdraw any objection to
specifying Mandarin in the registration request.
Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
More information about the Ietf-languages