We may not be that far apart on this -- sometimes the terminology may be getting in our way.<br><br>For example, the notion of a &quot;multi-state&quot; table is not all that different than what we effectively have in Unicode. Look at the diagram in 

<a href="http://www.unicode.org/reports/tr31/#Introduction">http://www.unicode.org/reports/tr31/#Introduction</a>, &quot;Figure 1. <span style="font-weight: 400;">Code Point 

        Categories 

        for Identifier Parsing&quot;.&nbsp;&nbsp;</span>That is, for the purposes of identifiers, we divide up characters into certain classes:<br><ol><li>Identifier characters (roughly letters, marks, decimal numbers)</li><li>Pattern characters (whitespace and &quot;syntax&quot; like +, -. ...)

</li><li>Other (assigned or unassigned)</li></ol>To see a list of the Pattern characters, see <a href="http://www.unicode.org/Public/UNIDATA/PropList.txt">http://www.unicode.org/Public/UNIDATA/PropList.txt</a>, and search for either:

<br><ul><li>Pattern_White_Space</li><li>Pattern_Syntax.</li></ul>To see a list of the ID characters, see <a href="http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt">http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt

</a>, and search for<br><ul><li>XID_Continue</li></ul>And we have constraints on changes in the future (as one can see reading down in <a href="http://www.unicode.org/reports/tr31/">http://www.unicode.org/reports/tr31/</a>

). In particular, the pattern characters are ones that we will never change. I think one difference might be the motivation. The Pattern characters were designed to be an immutable set that could be used syntactically without worrying that one of them would be in the future included into identifiers. And for that purpose, they were put out of bounds for inclusion in future identifiers. Because of those strict guarantees, we were extremely conservative about their contents. (The definition of this property was produced in response to requests from the W3C.)

<br><br>It certainly would be possible to have a similar set of characters for IDN, one that we guaranteed would never be added into IDNs in the future. But we&#39;d have to be quite careful that we didn&#39;t include by mistake the equivalent of the middle-dot.

<br><br>So if in the development of IDN tables, we had 3 classes of characters, listed below, I don&#39;t think it is much of a problem, as long as we are extremely conservative about class #2.<br><ol><li>characters in IDN

</li><li>characters that will never be added to IDN</li><li>characters (and unassigned code points) that could be added to IDN in the future</li></ol>I agree with Ken that as far as the implementer is concerned, class #1 is the key issue. And thus my main trepidation about spending time on #2 is just that it diverts us from #1. If people really felt that #2 was important for development, I&#39;d suggest using for a basis the following set:

<span><br></span><ul><li><span>Pattern_Syntax</span></li><li><span>minus &quot;-&quot;</span></li><li><span>plus ASCII characters currently disallowed by IDN (that is, ASCII except -, a-z, A-Z, 0-9<br></span></li><li><span>

plus control &amp; format characters (except for ZWJ, ZWNJ)<br></span></li></ul><div style="margin-left: 40px;"><span></span></div><span>Mark</span><span><br><br></span><br><br>