Default_Ignorable_Code_Point

Simon Josefsson simon at josefsson.org
Tue Mar 22 16:57:56 CET 2011


Mark Davis ☕ <mark at macchiato.com> writes:

> In http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt
>
> <http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt>Search in
> page for "Default_I" to find it.

Thanks -- I was expecting the text at the top...

The file says:

# Derived Property: Default_Ignorable_Code_Point
#  Generated from
#    Other_Default_Ignorable_Code_Point
#  + Cf (Format characters)
#  + Variation_Selector
#  - White_Space
#  - FFF9..FFFB (Annotation Characters)
#  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)

00AD          ; Default_Ignorable_Code_Point # Cf       SOFT HYPHEN
034F          ; Default_Ignorable_Code_Point # Mn       COMBINING GRAPHEME JOINER
...

I'm not sure how to interpret this.

Is the meaning of Default_Ignorable_Code_Point in RFC 5892:

1) The actual list of code points, which is Unicode version specific,
with the Default_Ignorable_Code_Point flag?

2) All code points that fulfills these critera, for all future Unicode
versions:

    Other_Default_Ignorable_Code_Point
  + Cf (Format characters)
  + Variation_Selector
  - White_Space
  - FFF9..FFFB (Annotation Characters)
  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)

3) Whatever Default_Ignorable_Code_Point is defined to be in whatever
Unicode version is the recent.

I am guessing 2) because the reference in RFC 5892 is explicitly to a
versioned Unicode data file rather than to TR44.

(Note that the definitions are slightly different in 5.2.0 and 6.0.0.)

/Simon


More information about the Idna-update mailing list