Let me try again. (And this is not your fault -- I've never liked the way that this particular terminology is handled, since it leads to misunderstandings.)<br><br>Given a code point X:<br><ol><li><i>gc=Cn</i> means that there is no <b>character</b> assigned for X. That is, X is not an <b>assigned character</b>. Note that the long name for gc=Cn is <i>General_Category=Unassigned</i> (see <a href="http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt">http://unicode.org/Public/UNIDATA/PropertyValueAliases.txt</a>)</li>
<li><i>Noncharacter=true</i> means that X, although <b>not</b> assigned as a character, is given a special function. In that sense, it is an <b>assigned code point</b>, just not assigned as a character.<br></li></ol>There are some other oddities: for example, surrogate code points (gc=Cs) are also not assigned characters, but they are not gc=unassigned. Those, however, don't seem to cause people much problem conceptually. Functionally, as I've said, noncharacters are best thought of as "super private use" characters, and they could have been incorporated into the general category under that rubric, but that sense evolved over time.<br>
<br>This is all water under the bridge, mostly due to history, as the architecture grew in unforeseen ways, and stability policies put into place a long time ago prevented changes that would have made it conceptually simpler.<br>
<br>Does that help any?<br><br>Mark<br><br><div class="gmail_quote">On Fri, May 2, 2008 at 7:31 PM, Patrik Fältström <<a href="mailto:patrik@frobbit.se">patrik@frobbit.se</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
On 30 apr 2008, at 22.38, Mark Davis wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
The code points that are unassigned (gc=Cn) but that should be<br>
DISALLOWED are all and only the Noncharacters.<br>
</blockquote>
<br>
What has confused me all the time here is that I interpreted what Mark say as if gc=Cn give the unassigned codepoints.<br>
<br>
That is not true, wich shows that I misunderstood what he here wrote.<br>
<br>
gc=Cn gives the unassigned codepoints PLUS the Noncharacter ones.<br>
<br>
So, one can NOT use gc=Cn as a test for unassigned codepoints. It is more complicated than that.<br><font color="#888888">
<br>
Patrik<br>
<br>
</font></blockquote></div><br><br clear="all"><br>-- <br>Mark