LANGUAGE SUBTAG REGISTRATION FORM: pinyin

Sun Aug 3 20:02:10 CEST 2008

Kent Karlsson <kent dot karlsson14 at comhem dot se> wrote:

> I think it is a bad idea to have this variant for the macrolanguage 
> code 'zh'. 'cmn-Latn-pinyin', 'yue-Latn-pinyin' (4646bis), ok. But not 
> 'zh-Latn-pinyin' as the latter is ambiguous (unless one sees 'zh' as 
> an alias for 'cmn'...).

There has been a long and contentious discussion on LTRU about this 
whole macrolanguage thing.  Just about everyone agrees that 'zh' should 
not be perceived as a 1-to-1 alias for 'cmn', but just about everyone 
also agrees that, for various reasons, most content tagged 'zh' is in 
fact Mandarin, and so a de-facto assumption may be drawn by some people.

LTRU came up with a very delicately crafted solution regarding 
macrolanguages, after a lot of effort and as a result of different 
viewpoints and perspectives heard over a long period of time, and I 
don't think any other solution could have been reached without at least 
as much controversy and pain.

"zh-Latn-pinyin" or even "zh-pinyin" would indeed be ambiguous in that 
they don't specify which variety of "Chinese" is being written in 
Pinyin.  For that matter, "zh" or "zh-Hant" or "zh-CN" are also 
ambiguous in the same way.  We can expect a good deal of this ambiguity 
in the future, and should not expect that everyone will immediately 
retag all their "zh" content as "cmn" or "yue".

>> It's not as though we were talking about using 'boont' with Russian.
>
> ???

"ru-boont" would be impossible by definition.  "zh-(Latn-)pinyin" may be 
underspecified, but certainly is not impossible.

Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> ISO 639-3 claims that 'zho' is an alias for 'zh', not 'cmn'.  IMO a 
> reason why adding ISO 639-3 to the IANA subtag registry without 
> stating what is what could create a huge mess.

I think quite a bit of the text of draft-4646bis has been devoted to 
"stating what is what."  It is not simply left up to interpretation. 
The danger, as always (and unavoidably), comes from people deciding to 
use the Registry itself for operational guidance without bothering to 
read the RFC that provides all these explanations.

Kent replied to Frank:

> You missed my (rhetorical) point completely. cmn is the code for one 
> of the languages (Mandarin) *encompassed* by the macrolanguage code(s) 
> zh/zho (Chinese). zh-Latn-pinyin is in the (current) proposed 
> registration form limited to Mandarin, while 'zh' is not limited to 
> Mandarin. And Pinyin for Cantonese (yue, also encompassed by zh/zho) 
> differs in both language and romanization rules.

and Randy Presuhn <randy underscore presuhn at mindspring dot com> 
wrote:

> This is *NOT* the same orthography.  It's not even terribly similar - 
> it's about like saying that English and Swedish share a common 
> orthography. The Pinyin in the registration request is optimized to 
> the phonology of Mandarin.  It only provides for four tones (+ 
> neutral), and is consequently ill-suited to languages like Cantonese 
> that have more elaborate tone systems.

It depends on what exactly is meant by "Pinyin" in the registration 
request.  Back to the English/Swedish metaphor: the orthographies used 
by these two languages aren't 100% equivalent, but they aren't 100% 
different either: 'b' still means [b].

If Mark's intent is to represent all Pinyin-like romanizations, that is 
one thing, and if his intent is to specify only the Hanyu Pinyin defined 
for Mandarin, that is another.  In the latter case, and only in that 
case, would Gerard Meijssen's characterization of "homonyms" be 
accurate.  And in that case, I would withdraw any objection to 
specifying Mandarin in the registration request.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ