Default_Ignorable_Code_Point

Mark Davis ☕ mark at macchiato.com
Tue Mar 22 17:19:38 CET 2011


The RFC is written to be version-neutral. I think it was just a mistake that
it refers to a specific version of the file.

The data in the UCD establish the exact definition of the different
properties, subject to the stability policies on
http://unicode.org/policies/stability_policy.html. That is, the derivation
of the property is not normative; the data is.

See also Section 2.1 in http://www.unicode.org/reports/tr44/#Simple_Derived.
In particular:

Implementations should simply use the derived properties, and should not try
to rederive them from lists of simple properties and collections of rules,
because of the chances for error and divergence when doing so.

Definitions of property derivations are provided for information only,
typically in comment fields in the data files. Such definitions may be
refactored, refined, or corrected over time.

Mark

*— Il meglio è l’inimico del bene —*


On Tue, Mar 22, 2011 at 08:57, Simon Josefsson <simon at josefsson.org> wrote:

> Mark Davis ☕ <mark at macchiato.com> writes:
>
> > In http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt
> >
> > <http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt>Search in
> > page for "Default_I" to find it.
>
> Thanks -- I was expecting the text at the top...
>
> The file says:
>
> # Derived Property: Default_Ignorable_Code_Point
> #  Generated from
> #    Other_Default_Ignorable_Code_Point
> #  + Cf (Format characters)
> #  + Variation_Selector
> #  - White_Space
> #  - FFF9..FFFB (Annotation Characters)
> #  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
> visible)
>
> 00AD          ; Default_Ignorable_Code_Point # Cf       SOFT HYPHEN
> 034F          ; Default_Ignorable_Code_Point # Mn       COMBINING GRAPHEME
> JOINER
> ...
>
> I'm not sure how to interpret this.
>
> Is the meaning of Default_Ignorable_Code_Point in RFC 5892:
>
> 1) The actual list of code points, which is Unicode version specific,
> with the Default_Ignorable_Code_Point flag?
>
> 2) All code points that fulfills these critera, for all future Unicode
> versions:
>
>    Other_Default_Ignorable_Code_Point
>  + Cf (Format characters)
>  + Variation_Selector
>  - White_Space
>  - FFF9..FFFB (Annotation Characters)
>  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
> visible)
>
> 3) Whatever Default_Ignorable_Code_Point is defined to be in whatever
> Unicode version is the recent.
>
> I am guessing 2) because the reference in RFC 5892 is explicitly to a
> versioned Unicode data file rather than to TR44.
>
> (Note that the definitions are slightly different in 5.2.0 and 6.0.0.)
>
> /Simon
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110322/2fc2cfe8/attachment.html>


More information about the Idna-update mailing list