comments below<br><br><div class="gmail_quote">On Wed, Apr 30, 2008 at 1:53 PM, Paul Hoffman <<a href="mailto:phoffman@imc.org">phoffman@imc.org</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">At 1:38 PM -0700 4/30/08, Mark Davis wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
It *is* related to Noncharacters. Default_Ignorable_Code_Point is a derived property. The code points that are unassigned (gc=Cn) but that should be DISALLOWED are all and only the Noncharacters.<br>
</blockquote>
<br></div>
Then I'm really confused. From the new draft:<br>
<br>
<a href="http://2.1.3." target="_blank">2.1.3.</a> IgnorableProperties (C)<br>
<br>
C: property(cp) is in {Default_Ignorable_Code_Point, White_Space,<br>
Noncharacter_Code_Point}<br>
<br>
This category is used to group codepoints that are not recommended<br>
for use in identifiers. In general, these codepoints are not<br>
suitable for use for IDN.<br>
<br>
The definition for Default_Ignorable_Code_Point can be found in<br>
DerivedCoreProperties.txt [1] (and erratum of 2007-January-25 [2])<br>
and is<br>
<br>
Other_Default_Ignorable_Code_Point + Cf + Cc + Cs<br>
+ Noncharacter_Code_Point + Variation_Selector<br>
- White_Space - FFF9..FFFB (Annotation Characters)<br>
</blockquote><div><br>That text has not been updated to U5.1. As I said earlier: <br><br><div style="margin-left: 40px;">"Note that there was a one-time cleanup of the Default Ignorable Code Point values in
Unicode 5.1.0, specifically to get it into good shape for IDNA
(<a href="http://www.unicode.org/versions/Unicode5.1.0/" target="_blank">http://www.unicode.org/versions/Unicode5.1.0/</a>
- see "Rendering Default
Ignorable Code Points" and the section following). This changed the
composition, so if
noncharacters are to be DISALLOWED, then they need to be specifically
mentioned. Functionally, it doesn't make a lot of difference, since the
Noncharacter_Code_Point values are immutable, and will always be
unassigned (gc=Cn), so they will never be part of valid labels. But
they can be specifically excluded by making Noncharacter_Code_Point be
specifically DISALLOWED, and for consistency I'd recommend doing that
in the tables document. BTW, here are the code points: <a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B:Noncharacter_Code_Point=True" target="_blank">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Noncharacter_Code_Point=True</a>:]"<br>
</div><br>The U5.1 definition is (<a href="http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt">http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt</a>):<br><pre># Derived Property: Default_Ignorable_Code_Point<br>
# Generated from<br># Other_Default_Ignorable_Code_Point<br># + Cf (Format characters)<br># + Variation_Selector<br># - White_Space<br># - FFF9..FFFB (Annotation Characters)<br># - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)</pre>
Because the other changes (Cs, Cc, some Cf) are already excluded due to other Categories, Noncharacters are the only change relevant to IDNAbis.<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Why have what whole list of things for "Default_Ignorable_Code_Point" if all we want is Noncharacter_Code_Point, which is already in the list for C? Why not have it at all?</blockquote><div><br>It is not the only thing. Some of them are redundant (already put in DISALLOWED via other Categories); the key ones are the Variation_Selector characters.<br>
<br>Does that help make things any clearer?<br></div></div><br>-- <br>Mark