<div dir="ltr">Hoi,<br>It is exactly when you are talking about Pinyin that there is no relation at all between the written text in the logograms and the text written in the Latin script. Pinyin represents the pronounciation of the spoken Mandarin words. Consequently it is wrong to associate Pinyin with the written Chinese.. This relation is just not there,<br>

Thanks,<br>&nbsp;&nbsp;&nbsp;&nbsp; Gerard<br><br><div class="gmail_quote">On Tue, Aug 5, 2008 at 10:15 AM, Tracey, Niall <span dir="ltr">&lt;<a href="mailto:niall.tracey@logica.com">niall.tracey@logica.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

From: <a href="mailto:mark.edward.davis@gmail.com">mark.edward.davis@gmail.com</a> [mailto:<a href="mailto:mark.edward.davis@gmail.com">mark.edward.davis@gmail.com</a>] On Behalf Of Mark Davis<br>

Sent: 04 August 2008 18:57<br>

<div class="Ih2E3d"><br>

&gt; Mandarin text has been validly tagged as &#39;zh&#39;, and will continue to be validly tagged as &#39;zh&#39;.<br>

<br>

</div>But what *is* Mandarin text?<br>

<br>

The point of &quot;zh&quot; is that a text written in Chinese logograms is not necessarily Mandarin. As I understand it, there are many Chinese languages that share a mutually comprehensible written mode -- it&#39;s pretty much impossible to point to a Chinese text and identify it unambiguously as Mandarin, unless the writer uses a lot of slang or colloquial idioms.<br>


<br>

However, once we write something in a pinyin, it is clear to us which Chinese language it is, so we really should be more specific -- if we skip a step in the hierarchy, it makes searching more complicated.<br>

<br>

Surely the point of a hierarchical naming convention is to allow rapid pruning of a dataset without having to examine all levels? With an explicit hierarchy we can do this very efficiently: if we want to search for text that a Mandarin speaker is likely to understand, we can do two steps to prune the search-space:<br>


1) Cut any texts not marked ZH<br>

2) Cut any texts with a variant other than CMN<br>

<br>

I&#39;m sure there&#39;s some intricacy of the current system that I&#39;ve missed that already makes this impossible in practice, but I feel we should aim to get closer to this state of affairs. Having to search for zh with cmn and/or pinyin and/or ... but not xx, yy, zz, .. etc is overcomplicated and will lead to errors. Not only this, but arguably it doesn&#39;t make the job of tagging the text easier in the first place. It&#39;s confusing when there are three or four &quot;correct&quot; ways of doing something.<br>


<br>

I&#39;m opposed to hiding any data that makes everyone&#39;s job harder.<br>

<div><div></div><div class="Wj3C7c"><br>

Níall.<br>

<br>

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.<br>


<br>

<br>

_______________________________________________<br>

Ietf-languages mailing list<br>

<a href="mailto:Ietf-languages@alvestrand.no">Ietf-languages@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/ietf-languages" target="_blank">http://www.alvestrand.no/mailman/listinfo/ietf-languages</a><br>

</div></div></blockquote></div><br></div>