Simple Properties and Derived Properties (was: Re: Normalization of Hangul)

Thu Feb 21 00:33:12 CET 2008

Patrik said:

> I do though in the ruby implementation now use the derived property  
> value Default_Ignorable_Codepoint, and not the base properties. I  
> would like to go back to use the original properties -- at least for  
> the implementation -- so I can really say I have implemented things  
> from the base properties of the Unicode Standard when I am asked  
> whether there are interoperable implementations.

I would not advise doing so. Using Default_Ignorable_Code_Point
is the correct thing to do, and is using the Unicode
character properties as intended.

The Unicode Standard doesn't actually have a concept of
a "base property" (in part because that term would be
ambiguous for "property of a base character"), but I
think what you are getting at is the distinction made
on p. 89 of the standard between:

D45 Simple property
D46 Derived property

The thing is, however, that a "derived property" in the Unicode
Standard is not a second-class citizen.

Many simple properties in the Unicode Character Database are
defined *only* to contribute to the derivation and/or
stabilization of important derived properties.

Among the first-class *derived* properties important for
implementations are: Uppercase, Lowercase, XID_Start,
XID_Continue, Math, and yes, Default_Ignorable_Code_Point
(all listed in DerivedCoreProperties.txt). Another collection
of derived properties important to optimized implementations of
normalization are in DerivedNormalizationProps.txt.

Implementations should *use* the derived properties, and not
try to rederive them from lists of simple properties and
collections of rules, because of the chances for error and
divergence when doing so.

A user (as opposed to a maintainer) of the Unicode
Standard should see Default_Ignorable_Code_Point as
just another character property and use it at face value.

--Ken