Simple Properties and Derived Properties (was: Re: Normalization of
Hangul)
Kenneth Whistler
kenw at sybase.com
Thu Feb 21 00:33:12 CET 2008
Patrik said:
> I do though in the ruby implementation now use the derived property
> value Default_Ignorable_Codepoint, and not the base properties. I
> would like to go back to use the original properties -- at least for
> the implementation -- so I can really say I have implemented things
> from the base properties of the Unicode Standard when I am asked
> whether there are interoperable implementations.
I would not advise doing so. Using Default_Ignorable_Code_Point
is the correct thing to do, and is using the Unicode
character properties as intended.
The Unicode Standard doesn't actually have a concept of
a "base property" (in part because that term would be
ambiguous for "property of a base character"), but I
think what you are getting at is the distinction made
on p. 89 of the standard between:
D45 Simple property
D46 Derived property
The thing is, however, that a "derived property" in the Unicode
Standard is not a second-class citizen.
Many simple properties in the Unicode Character Database are
defined *only* to contribute to the derivation and/or
stabilization of important derived properties.
Among the first-class *derived* properties important for
implementations are: Uppercase, Lowercase, XID_Start,
XID_Continue, Math, and yes, Default_Ignorable_Code_Point
(all listed in DerivedCoreProperties.txt). Another collection
of derived properties important to optimized implementations of
normalization are in DerivedNormalizationProps.txt.
Implementations should *use* the derived properties, and not
try to rederive them from lists of simple properties and
collections of rules, because of the chances for error and
divergence when doing so.
A user (as opposed to a maintainer) of the Unicode
Standard should see Default_Ignorable_Code_Point as
just another character property and use it at face value.
--Ken
More information about the Idna-update
mailing list