I want to make sure that people understand that the contextual rules in <a href="http://www.unicode.org/reports/tr31/#Layout_and_Format_Control_Characters">http://www.unicode.org/reports/tr31/#Layout_and_Format_Control_Characters</a> (section 2.3) are not perfect. They do not characterize <i>precisely </i>all and only those cases where joiners make a visual difference. Part of the issue is that the results may vary somewhat by font.<br>

<br>What those rules do is filter it down the problematic cases to an extremely small set, as a percentage of total text. It prevents problems with the majority of scripts: Latin, Cyrillic, CJK, and so on. It also does a good job with Arabic, with any normal fonts.<br>

<br>With Indic scripts, the situation is slightly different. The rules limit the cases severely, disallowing joiners where they don&#39;t make a visual difference after almost all characters. However, taking the example of Malayalam, something like half of the cases where it allows joiners will not typically have a difference in visual display. With Tamil even fewer, with Sinhala, more.<br>

<br>Now, that &quot;filtering down to an extremely small set&quot; is worth doing, either in the protocol or via client side notification, but I just wanted people to understand the limitations, that it is not a panacea.<br>

<br clear="all">Mark<br>

<br><br><div class="gmail_quote">On Thu, Feb 26, 2009 at 20:17, John C Klensin <span dir="ltr">&lt;<a href="mailto:klensin@jck.com">klensin@jck.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<br>

--On Thursday, February 26, 2009 19:46 -0800 Paul Hoffman<br>

<div class="Ih2E3d">&lt;<a href="mailto:phoffman@imc.org">phoffman@imc.org</a>&gt; wrote:<br>

<br>

&gt; This tees into John&#39;s recent thread on parsing the issues and<br>

&gt; finding a middle ground. I have included many of the<br>

&gt; suggestions from the mailing list and off-line responses. Most<br>

&gt; significantly, I have changed ZWNJ and ZWJ from &quot;mapped to<br>

&gt; nothing&quot; to being allowed so that Arabic labels will be more<br>

&gt; realistic.<br>

<br>

</div>Paul,<br>

<br>

With the understanding that I still don&#39;t believe this is the<br>

right way to go, one technical correction and one issue:<br>

<br>

(1) ZWJ and ZWNJ are not needed for Arabic language orthography.<br>

ZWNJ is needed for Persian languages and what are sometimes<br>

called Indo-Arabic ones (e.g., Urdu, but there are _many_<br>

others).  Both ZWJ and ZWNJ are needed for several of the Indic<br>

scripts and associated languages (although slightly fewer with<br>

Unicode 5.1 than with Unicode 3.2).<br>

<br>

(2) When one considers the number of registries/zones on the<br>

Internet or even those that exist only at the second level<br>

(i.e., maintaining registrations for third-level names), it is<br>

certain that some of them will be operated by people with bad<br>

intentions.  Given that, are you confident that ZWJ/ZWNJ can<br>

simply be treated as ordinary characters, relying on the<br>

registries to prevent those characters where they would be fully<br>

invisible?<br>

<br>

When faced with that question very early in the IDNA2008 design<br>

process, we concluded that there were four possible answers:<br>

<br>

        (i) Yes, we trust the registries and are willing to live<br>

        with labels like &quot;ábc&quot; failing to compare equal to<br>

        &quot;áb&lt;ZWJ&gt;c&quot; despite looking exactly the same when<br>

        displayed by normal rendering software.<br>

<br>

        (ii) We don&#39;t quite trust the registries but are<br>

        confident that all rendering software, on all operating<br>

        systems, that encounter strings like &quot;áb&lt;ZWJ&gt;c&quot; will<br>

        get upset in sufficient vivid ways to warn the user off.<br>

        We didn&#39;t think rendering ZWJ as a little box or<br>

        question mark would be adequate for that case because it<br>

        might be a legitimate character for which no font was<br>

        available even though it would at least not be confused<br>

        with &quot;ábc&quot;.<br>

<br>

        (iii) We either leave things as they are in IDNA2003<br>

        (map to nothing) or simply ban the character.  Either<br>

        one puts the scripts that need one or both of these<br>

        characters at an intolerable disadvantage.<br>

<br>

        (iv) We adopt some sort of &quot;contextual rule&quot; model,<br>

        despite the complexity it adds.<br>

<br>

Obviously, we chose the fourth.  We did so because we didn&#39;t<br>

believe the assumptions that (i) or (ii) implied and did not<br>

consider (iii) to be acceptable given the number of people who<br>

use the relevant scripts.   As I read your document, you are<br>

proposing (i).   Is that correct and, if so, could you explain a<br>

bit better how you see the tradeoffs?<br>

<br>

Please also note that, if you permit ZWJ and/or ZWNJ as<br>

characters, we end up in exactly the same situation that you and<br>

others have objected to with Eszett and Final Sigma, i.e., an<br>

input string that converts to a different A-label in IDNA2003<br>

and IDNA2008.  I&#39;m prepared to live with that but, to the degree<br>

to which you consider it a problem so serious as to require<br>

rechartering and a completely different document strategy, I&#39;d<br>

like to better understand the exception and its implications.<br>

In particular, I don&#39;t see the section of your outline document<br>

that discussed the transition strategy that many people (I think<br>

including you, but could be wrong about that) have argued is<br>

absolutely essential if there are going to be any<br>

incompatibilities of that sort.<br>

<br>

best,<br>

<font color="#888888">   john<br>

</font><div><div></div><div class="Wj3C7c"><br>

_______________________________________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

</div></div></blockquote></div><br>