comments below<br><br><div class="gmail_quote">On Wed, Apr 30, 2008 at 1:53 PM, Paul Hoffman &lt;<a href="mailto:phoffman@imc.org">phoffman@imc.org</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d">At 1:38 PM -0700 4/30/08, Mark Davis wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

It *is* related to Noncharacters. Default_Ignorable_Code_Point is a derived property. The code points that are unassigned (gc=Cn) but that should be DISALLOWED are all and only the Noncharacters.<br>

</blockquote>

<br></div>

Then I&#39;m really confused. From the new draft:<br>

<br>

<a href="http://2.1.3." target="_blank">2.1.3.</a> &nbsp;IgnorableProperties (C)<br>

<br>

 &nbsp; C: property(cp) is in {Default_Ignorable_Code_Point, White_Space,<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Noncharacter_Code_Point}<br>

<br>

 &nbsp; This category is used to group codepoints that are not recommended<br>

 &nbsp; for use in identifiers. &nbsp;In general, these codepoints are not<br>

 &nbsp; suitable for use for IDN.<br>

<br>

 &nbsp; The definition for Default_Ignorable_Code_Point can be found in<br>

 &nbsp; DerivedCoreProperties.txt [1] (and erratum of 2007-January-25 [2])<br>

 &nbsp; and is<br>

<br>

 &nbsp; Other_Default_Ignorable_Code_Point + Cf + Cc + Cs<br>

 &nbsp; + Noncharacter_Code_Point + Variation_Selector<br>

 &nbsp; - White_Space - FFF9..FFFB (Annotation Characters)<br>

</blockquote><div><br>That text has not been updated to U5.1. As I said earlier: <br><br><div style="margin-left: 40px;">&quot;Note that there was a one-time cleanup of the Default Ignorable Code Point values in

Unicode 5.1.0, specifically to get it into good shape for IDNA

(<a href="http://www.unicode.org/versions/Unicode5.1.0/" target="_blank">http://www.unicode.org/versions/Unicode5.1.0/</a>

- see &quot;Rendering Default

Ignorable Code Points&quot; and the section following). This changed the

composition, so if

noncharacters are to be DISALLOWED, then they need to be specifically

mentioned. Functionally, it doesn&#39;t make a lot of difference, since the

Noncharacter_Code_Point values are immutable, and will always be

unassigned (gc=Cn), so they will never be part of valid labels. But

they can be specifically excluded by making Noncharacter_Code_Point be

specifically DISALLOWED, and for consistency I&#39;d recommend doing that

in the tables document. BTW, here are the code points: <a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:Noncharacter_Code_Point=True" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True</a>:]&quot;<br>

</div><br>The U5.1 definition is (<a href="http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt">http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt</a>):<br><pre># Derived Property: Default_Ignorable_Code_Point<br>

#  Generated from<br>#    Other_Default_Ignorable_Code_Point<br>#  + Cf (Format characters)<br>#  + Variation_Selector<br>#  - White_Space<br>#  - FFF9..FFFB (Annotation Characters)<br>#  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)</pre>

Because the other changes (Cs, Cc, some Cf) are already excluded due to other Categories, Noncharacters are the only change relevant to IDNAbis.<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Why have what whole list of things for &quot;Default_Ignorable_Code_Point&quot; if all we want is Noncharacter_Code_Point, which is already in the list for C? Why not have it at all?</blockquote><div><br>It is not the only thing. Some of them are redundant (already put in DISALLOWED via other Categories); the key ones are the Variation_Selector characters.<br>

<br>Does that help make things any clearer?<br></div></div><br>-- <br>Mark