Default_Ignorable_Code_Point
Simon Josefsson
simon at josefsson.org
Tue Mar 22 16:57:56 CET 2011
Mark Davis ☕ <mark at macchiato.com> writes:
> In http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt
>
> <http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt>Search in
> page for "Default_I" to find it.
Thanks -- I was expecting the text at the top...
The file says:
# Derived Property: Default_Ignorable_Code_Point
# Generated from
# Other_Default_Ignorable_Code_Point
# + Cf (Format characters)
# + Variation_Selector
# - White_Space
# - FFF9..FFFB (Annotation Characters)
# - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)
00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN
034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME JOINER
...
I'm not sure how to interpret this.
Is the meaning of Default_Ignorable_Code_Point in RFC 5892:
1) The actual list of code points, which is Unicode version specific,
with the Default_Ignorable_Code_Point flag?
2) All code points that fulfills these critera, for all future Unicode
versions:
Other_Default_Ignorable_Code_Point
+ Cf (Format characters)
+ Variation_Selector
- White_Space
- FFF9..FFFB (Annotation Characters)
- 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)
3) Whatever Default_Ignorable_Code_Point is defined to be in whatever
Unicode version is the recent.
I am guessing 2) because the reference in RFC 5892 is explicitly to a
versioned Unicode data file rather than to TR44.
(Note that the definitions are slightly different in 5.2.0 and 6.0.0.)
/Simon
More information about the Idna-update
mailing list