LANGUAGE SUBTAG REGISTRATION FORM: pinyin
Karen_Broome at spe.sony.com
Tue Aug 5 01:54:02 CEST 2008
Notes in line:
From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis
Sent: Monday, August 04, 2008 4:04 PM
To: Broome, Karen
Cc: Phillips, Addison; Kent Karlsson; ietf-languages at alvestrand.no
Subject: Re: LANGUAGE SUBTAG REGISTRATION FORM: pinyin
I think you see some Dark Conspiracy where there is none. My strong intention, shared I'm sure by others on this list like John, is to add the prefixes zh-cmn and cmn, once those are available.
Karen: Not a dark conspiracy, but a strong difference of opinion perhaps based on the use cases we represent: identification of content in tightly defined industry protocols vs. fuzzy retrieval of historical content across a wide variety of industries and content creators. Of course, this RFC needs to serve us both.
zh-guoyu cannot be added as a prefix, since zh-guoyu-pinyin would be illegal.
zh-cmn also cannot be added as a prefix, under RFC4646
Karen: Point taken. I wasn't suggesting that zh-cmn be added under RFC 4646 for this variant tag, but you're right about Guoyu.
Remember, prefixes are a guide for good usage, to indicate that other values would be *inappropriate*. zh-cmn-Latn-pinyin makes sense, as does zh-Latn-pinyin. But da-pinyin *doesn't*. It is cases like the latter than the prefixes are designed to discourage.
Karen: Right. I think "zh-pinyin" is bad usage. Like other "zh" tags, this tag has multiple meanings and could be easily misinterpreted.
And zh is not deprecated in RFC4646, nor is it deprecated in the current text of RFC4646bis, nor can I forsee that it would ever be deprecated.
Karen: Mark, I don't think you have as much agreement here as you think you do -- even those that don't support deprecation now may support deprecation in the future when they are dealing with the more balanced percentages of Cantonese and Mandarin found in audio contexts and seeing these variants expressed in a wide variety of ways. In not deprecating it, we are giving this tag a special privilege not given to other tags: allowing more than one preferred variant. Every exception to the rule makes this work harder and harder to explain to those who are trying to use it.
We should consider deprecating "zh" in the future because best practice in all tagging is to use unambiguous tags. Deprecation does not mean that this tag cannot be used; it means that it is not preferred and there is a newer, better tag that should be used instead. I do not think using "zh" continues to be a wise choice for content creators who want to make sure their tags are interpreted correctly downstream.
Of course, search engines will need to be aware of this tag for a long time to come, but I believe we are failing to do our job if we do not warn content creators that today's Internet is an audiovisual experience and in this context, "zh" is bad choice for a tag. The percentages you see in the public text you analyzed would not hold true for private audio uses your bots can't possibly find.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages