One other restriction that I forgot.<br><br>B4. Disallow problematic sequences for normalization<br><ul><li>These are listed <a href="http://www.unicode.org/reports/tr15/#Corrigendum_5_Sequences">http://www.unicode.org/reports/tr15/#Corrigendum_5_Sequences

</a></li><li>This is a relatively uncontroversial change, even as a hard restriction, since the sequences are degenerate -- not meaningful in any language.<br></li></ul>Mark<br><br><div><span class="gmail_quote">On 11/27/06, 

<b class="gmail_sendername">Mark Davis</b> &lt;<a href="mailto:markdavis@google.com">markdavis@google.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

In order to assess the advantages and disadvantages of any approach, we need to have a good idea of the goals and the weights attached to them. Here is an initial take on some of the issues so far discussed, divided into categories.

A. Loosen some restrictions on IDNA. The goal is to allow, <span style="font-style: italic;">*where feasible*, the same kind of expressive capability in other languages that is now provided for in English. It should be recognized that not all reasonable words of every language will qualify: even in English the lack of spaces and other punctuation forces compromises: words like &quot;can't&quot; are disallowed.

<br><br>Here is what I've heard so far:<br><ol><li>Allow Unicode 5.0 characters</li><li>Provide for some mechanism for more quickly updating to successive Unicode versions.<br></li><li>Allow for combining marks at the end of bidi fields

</li><li>Allow for ZWJ/ZWNJ in limited contexts (see a previous message). </li></ol>Except for #4, which probably most people haven't looked through yet, it appears that these are mostly uncontroversial. B. Tighten some restrictions on IDNA. The purpose of this appears to be to reduce the opportunity for spoofing. Thus any proposed restrictions should be assessed against that metric. That is: (a) does the restriction reduce spoofing significantly? (b) Are there no other reasonable mechanisms for doing so?

Here is what I've heard so far: <ol><li>Remove (or discourage) symbols and (most) punctuation.</li><ul><li>This appears to be mostly uncontroversial. While the vast majority of symbols and punctuation do not cause spoofing problems (I♥NY.com is not a problem, for example), there is not enough value to having them to be worth the effort.

</li></ul><li>Remove (or discourage) non-spacing marks.</li><ul><li>This is quite controversial. These marks are needed by many languages; excluding them is like removing vowels from English: &quot;<a href="http://microsoft.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

microsoft.com</a>&quot; becoming &quot;<a href="http://mcrsft.cm" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">mcrsft.cm</a>&quot;.</li><li>A very good case has to be made that they (a) cause problems, and (b) those problems can't feasibly be handled with other mechanisms.

</li></ul><li>Remove (or discourage) archaic / technical characters (characters not in common modern use)<br></li><ul><li>Unicode supplies a proposed list of such characters, in <a href="http://www.unicode.org/reports/tr39/#General_Security_Profile" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

http://www.unicode.org/reports/tr39/#General_Security_Profile</a>. However, it is recognized that any such list will need refinement and extension in the future. </li><li>Certain scripts are quite clearly archaic, and could be easily removed or discouraged.

</li><li>Judging whether a character in a modern script is archaic, especially those in broad usage such as Latin, Arabic, and Cyrillic, can be quite difficult -- often these characters are pressed into use in minority languages.

<br></li></ul></ol>A major issue is the choice between removal and

discouragement. Removal has the very significant cost of breaking

backwards compatibility, so a clear case has to be made that there is

no feasible alternative to handle spoofing problems that would otherwise occur.<br><span class="sg"><br>Mark

</span></blockquote></div><br><br clear="all"><br>-- <br>Mark