Default_Ignorable_Code_Point

Patrik Fältström patrik at frobbit.se
Tue Mar 22 17:36:17 CET 2011


On 22 mar 2011, at 17.19, Mark Davis ☕ wrote:

> The RFC is written to be version-neutral. I think it was just a mistake that
> it refers to a specific version of the file.

I would say so, yes, although informal references are ok to be to specific versions, and there for a reason. It should have been more clear that for 5.2 the definition is... And then the informal reference.

   Patrik

> The data in the UCD establish the exact definition of the different
> properties, subject to the stability policies on
> http://unicode.org/policies/stability_policy.html. That is, the derivation
> of the property is not normative; the data is.
> 
> See also Section 2.1 in http://www.unicode.org/reports/tr44/#Simple_Derived.
> In particular:
> 
> Implementations should simply use the derived properties, and should not try
> to rederive them from lists of simple properties and collections of rules,
> because of the chances for error and divergence when doing so.
> 
> Definitions of property derivations are provided for information only,
> typically in comment fields in the data files. Such definitions may be
> refactored, refined, or corrected over time.
> 
> Mark
> 
> *— Il meglio è l’inimico del bene —*
> 
> 
> On Tue, Mar 22, 2011 at 08:57, Simon Josefsson <simon at josefsson.org> wrote:
> 
>> Mark Davis ☕ <mark at macchiato.com> writes:
>> 
>>> In http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt
>>> 
>>> <http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt>Search in
>>> page for "Default_I" to find it.
>> 
>> Thanks -- I was expecting the text at the top...
>> 
>> The file says:
>> 
>> # Derived Property: Default_Ignorable_Code_Point
>> #  Generated from
>> #    Other_Default_Ignorable_Code_Point
>> #  + Cf (Format characters)
>> #  + Variation_Selector
>> #  - White_Space
>> #  - FFF9..FFFB (Annotation Characters)
>> #  - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
>> visible)
>> 
>> 00AD          ; Default_Ignorable_Code_Point # Cf       SOFT HYPHEN
>> 034F          ; Default_Ignorable_Code_Point # Mn       COMBINING GRAPHEME
>> JOINER
>> ...
>> 
>> I'm not sure how to interpret this.
>> 
>> Is the meaning of Default_Ignorable_Code_Point in RFC 5892:
>> 
>> 1) The actual list of code points, which is Unicode version specific,
>> with the Default_Ignorable_Code_Point flag?
>> 
>> 2) All code points that fulfills these critera, for all future Unicode
>> versions:
>> 
>>   Other_Default_Ignorable_Code_Point
>> + Cf (Format characters)
>> + Variation_Selector
>> - White_Space
>> - FFF9..FFFB (Annotation Characters)
>> - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
>> visible)
>> 
>> 3) Whatever Default_Ignorable_Code_Point is defined to be in whatever
>> Unicode version is the recent.
>> 
>> I am guessing 2) because the reference in RFC 5892 is explicitly to a
>> versioned Unicode data file rather than to TR44.
>> 
>> (Note that the definitions are slightly different in 5.2.0 and 6.0.0.)
>> 
>> /Simon
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list