LANGUAGE SUBTAG REGISTRATION FORM (R3): pinyin

CE Whitehead cewcathar at hotmail.com
Tue Sep 9 18:42:33 CEST 2008



Hi, as has been said before by Frank and others, I do not see that because people leave off the [latn] that we should not recommend [latn] as a prefix--that's tantamount to saying that because people may tag incorrectly we should skip worrying about having correct subtags, etc., period.  I agree with Frank that what is important is somehow telling the search engines and such that this is [latn] script.  One solution is a [supress-script] for variants--although that will not take care of the problem that you can combine both [fonipa] and [pinyin] into a single tag.


As for whether having multiple prefixes will best solve the [pinyin] problem (there might not be too many if we use [zh] as a catch-all prefix rather than to mean 'Mandarin' as I think Mark intended it to mean), I am not sure--so I leave that to the list and to Michael.
 
I still think of [fonipa] as a bit different from [pinyin] because you can use any language subtag as a prefix; in addition the IPA [international phonetic alphabet] makes use of many more non-Latin characters than does [pinyin]--because the IPA attempts to capture the various  language's sounds more exactly (I think the Unicode charts for non-Latin symbols, for example, the glottal stop (at the beginning of say, 'apple,' important in Semitic languages) the palatal n, other letters, are organized as 'special use' characters, not as 'Latin')--but of course it might still be nice to be able to distinguish [fonipa] from [fonupa] with tags indicating the script . . .


If Michael's registration form goes through, my suggestion would be to correct the description field if needed--if we intend [pinyin] to refer as well to Romanizations of Tibetan in the People's Republic of China.


--C. E. Whitehead
cewcathar at hotmail.com



Michael Everson everson at evertype.com 
Mon Sep 8 22:22:49 CEST 2008


> Well, more likely this:

> LANGUAGE SUBTAG REGISTRATION FORM
> 3. Record Requested:

> Type: variant
> Subtag: pinyin
> Description: Pinyin romanization of Chinese
If [pinyin] includes romanizations of non-Chinese languages (such as Tibetan) I think this should be made clear in the description field.
> 4. Intended meaning of the subtag:

> To distinguish Chinese content written in Latin characters using the
> Pinyin romanization (transliteration/transcription) from other
> possible transcriptions, particularly from Wade-Giles.
> The primary use is for Mandarin Chinese (where the prefixes zh- and/or
> zh-Latn- may be used); other languages may also use this subtag, with
> or without -Latn-.

 Correct to "Primarily to distinguish Chinese content"??

My understanding is that [Pinyin] in some form is also used for a number of languages that are spoken in China that belong to the Tibeto-Burman family, that it is the most common Romanization used to write Chinese and some other languages in the People's Republic; that it is not officially used in Taiwan (but maybe used used unofficially?)--

not sure if this is what we mean by [pinyin] or not.


> Michael Everson *  



(I did so some research as Michael suggested; I did not find much except for two Tibeto-Burman languages that use Pinyin as the Romanization in China; I found one Eastern Mongolian language where they are experimenting with an orthography based on Pinyin but it is not Pinyin--but it currently uses another Latin-based orthography, anyway.  So it seems that the only non-Chinese languages that use 'pinyin' are in the Tibeto=Burman family???


Of course, I do not know; I know little about Chinese and I was usually more interested in syntax and rhetoric than most phonology anyway!)


http://www.ethnologue.com/show_language.asp?code=dta


"scholars are experimenting with a Latin orthography based on Pinyin"
(Eastern Mongolian lang family)


* * *


http://www.ethnologue.com/14/show_language.asp?code=CGP
(Family:  Tibeto-Burman)


* * *


https://www.ethnologue.com/show_language.asp?code=kac
(Family:  Tibeto-Burman)

* * *

Frank Ellermann nobody at xyzzy.claranet.de


> Michael Everson wrote:


>> I'd strip the remark about "omitting Latn", however.

>> I could live with that but I think in the real world
>> people will omit it.


> That's precisely the problem, as user you could be
> interested in "any zh-Latn".  Simple right-to-left
> matching won't find zh-fonipa.  Hopeless case, we
> can ignore it.


> A smart matching process could "know" that zh-fonipa
> is actually a shorthand for zh-Latn-fonipa.


> But it only knows this if the registry contains the
> info.  Or if the application has hardcoded knowledge
> about this special case.





Agreed!


> Counting pinyin we're already at three special cases,
> all following one "treat *-x as *-Latn-x" pattern.


> There will be more special cases with other matching
> patterns, "hardcoding" does not scale.


>> What's wrong with tagging bo-Latn-pinyin?


> Nothing.  But bo-pinyin is a problem when looking for
> bo-Latn.  Generic variants instead of an extension
> also have the strange effect that zh-fonipa-pinyin
> is syntactically allowed.  Unlike zh-FR-GB.





That is a problem.





> With an
> extension we could express what makes sense.





> Frank



--C. E. Whitehead
cewcathar at hotmail.com 


More information about the Ietf-languages mailing list