Default_Ignorable_Code_Point
Patrik Fältström
patrik at frobbit.se
Tue Mar 22 17:36:17 CET 2011
On 22 mar 2011, at 17.19, Mark Davis ☕ wrote:
> The RFC is written to be version-neutral. I think it was just a mistake that
> it refers to a specific version of the file.
I would say so, yes, although informal references are ok to be to specific versions, and there for a reason. It should have been more clear that for 5.2 the definition is... And then the informal reference.
Patrik
> The data in the UCD establish the exact definition of the different
> properties, subject to the stability policies on
> http://unicode.org/policies/stability_policy.html. That is, the derivation
> of the property is not normative; the data is.
>
> See also Section 2.1 in http://www.unicode.org/reports/tr44/#Simple_Derived.
> In particular:
>
> Implementations should simply use the derived properties, and should not try
> to rederive them from lists of simple properties and collections of rules,
> because of the chances for error and divergence when doing so.
>
> Definitions of property derivations are provided for information only,
> typically in comment fields in the data files. Such definitions may be
> refactored, refined, or corrected over time.
>
> Mark
>
> *— Il meglio è l’inimico del bene —*
>
>
> On Tue, Mar 22, 2011 at 08:57, Simon Josefsson <simon at josefsson.org> wrote:
>
>> Mark Davis ☕ <mark at macchiato.com> writes:
>>
>>> In http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt
>>>
>>> <http://unicode.org/Public/5.2.0/ucd/DerivedCoreProperties.txt>Search in
>>> page for "Default_I" to find it.
>>
>> Thanks -- I was expecting the text at the top...
>>
>> The file says:
>>
>> # Derived Property: Default_Ignorable_Code_Point
>> # Generated from
>> # Other_Default_Ignorable_Code_Point
>> # + Cf (Format characters)
>> # + Variation_Selector
>> # - White_Space
>> # - FFF9..FFFB (Annotation Characters)
>> # - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
>> visible)
>>
>> 00AD ; Default_Ignorable_Code_Point # Cf SOFT HYPHEN
>> 034F ; Default_Ignorable_Code_Point # Mn COMBINING GRAPHEME
>> JOINER
>> ...
>>
>> I'm not sure how to interpret this.
>>
>> Is the meaning of Default_Ignorable_Code_Point in RFC 5892:
>>
>> 1) The actual list of code points, which is Unicode version specific,
>> with the Default_Ignorable_Code_Point flag?
>>
>> 2) All code points that fulfills these critera, for all future Unicode
>> versions:
>>
>> Other_Default_Ignorable_Code_Point
>> + Cf (Format characters)
>> + Variation_Selector
>> - White_Space
>> - FFF9..FFFB (Annotation Characters)
>> - 0600..0603, 06DD, 070F (exceptional Cf characters that should be
>> visible)
>>
>> 3) Whatever Default_Ignorable_Code_Point is defined to be in whatever
>> Unicode version is the recent.
>>
>> I am guessing 2) because the reference in RFC 5892 is explicitly to a
>> versioned Unicode data file rather than to TR44.
>>
>> (Note that the definitions are slightly different in 5.2.0 and 6.0.0.)
>>
>> /Simon
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
More information about the Idna-update
mailing list