Some clarification.  1. It appears that you may think that NFKC does <span style="font-style: italic;">not forbid combining marks; however, it only forbids sequences that could be expressed with a combined form (with a few exceptions). Thus:

<br><br>A + acute is forbidden in NFKC<br>X + cedilla is not forbidden in NFKC<br><br>See <a href="http://www.unicode.org/reports/tr15/#Primary_Exclusion_List_Table">http://www.unicode.org/reports/tr15/#Primary_Exclusion_List_Table

</a><span style="font-weight: bold;"><br><br></span>2. Unicode composition and decomposition is not based on visual confusability (referring to your memo on Gujarati). For example, &quot;m&quot; does not decompose to &quot;rn&quot; even though those two sequences are visually confusable (at address box sizes in common fonts they look the same). Nor is it simply based on origin: &quot;w&quot; does not decompose to &quot;vv&quot;. For more examples, see 

<a href="http://unicode.org/charts/normalization/">http://unicode.org/charts/normalization/</a><br><br>Visual similarity is much broader than the Unicode composition and decomposition. See <a href="http://www.unicode.org/reports/tr39/#Confusable_Detection">

http://www.unicode.org/reports/tr39/#Confusable_Detection</a> Baking visual similarity into the protocol would be a real problem for many, many languages: it would be the equivalent of disallowing the use of the letter &quot;m&quot; in English.

<br><br>Mark<br><br><div><span class="gmail_quote">On 11/26/06, <b class="gmail_sendername">Sam Vilain</b> &lt;<a href="mailto:sam.vilain@catalyst.net.nz">sam.vilain@catalyst.net.nz</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

John C Klensin wrote: &gt;&gt; But doesn't è decompose to a sequence including that mark? &gt;&gt; &gt; I may miss your point but, if I don't, that is one of the &gt; reasons we have used NFKC, rather than NFKD, all along.

<br>&gt;<br><br>Oh, right :-}.&nbsp;&nbsp;Funny how little details like that can be missed.&nbsp;&nbsp;I<br>thought it happened the other way around.<br><br>This is a bit of a problem.&nbsp;&nbsp;The Indic scripts must be able to use their<br>combining marks/vowel signs; they don't have a rich enough set of

<br>pre-composed characters to write their language.&nbsp;&nbsp;And if romanised forms<br>of African languages need compositions which are not already there, then<br>they will never work.<br><br>This might need to wait for the next version, but it should be possible

<br>to permit combining characters without breaking backwards compatibility<br>or losing the intent of this specification, you'd need to:<br><br>1. be able to classify combining marks with their target scripts, to<br>make sure that you're not trying to combine a Latin diacritical mark

<br>with a Chinese ideograph (etc)<br><br>2. disallow combining marks except in places where they're expected<br><br>3. standardise on the NKFD form, except for where a pre-composed form<br>exists.<br><br>It's ugly, but any tidier suggestions that don't exclude &gt;25% of the

<br>world's population?&nbsp;&nbsp;:)<br><br>--<br>Sam Vilain, Systems Architect, Catalyst IT (NZ) Ltd.<br>phone: +64 4 499 2267&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;PGP ID: 0x66B25843<br><br>_______________________________________________<br>Idna-update mailing list

<br><a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br><a href="http://www.alvestrand.no/mailman/listinfo/idna-update">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br></blockquote></div>

<br><br clear="all"><br>-- <br>Mark