Simple Properties and Derived Properties (was: Re: Normalization of Hangul)

Thu Feb 21 01:27:04 CET 2008

Patrik asked:

> > Implementations should *use* the derived properties, and not
> > try to rederive them from lists of simple properties and
> > collections of rules, because of the chances for error and
> > divergence when doing so.
> 
> I understand, sort of, but, I still need to understand. If a derived  
> property is derived, if one do not have interoperable results when  
> calculating it again from the origins, then there is a bug somewhere.  
> I have not seen anything that say what is normative if the calculation  
> is giving a different result than what is in the DerivedProperties.txt  
> file.

O.k., I guess that is the source of the disconnect here.

For the Unicode Standard, the *data* files *are* the normative
values. (This is the opposite of what you have been proposing
for the Annex A list in the protocol tables document.)

If there is some mismatch between the statement of how the
derivation proceeds (which is typically in comments) and
the actually derived list, the *list* prevails.

This is for several reasons:

1. It is far easier to establish the differences between
   versions as simple diffs of lists, than it is to try to
   determine the potential ramifications of possibly complex
   statements of derivations. (It is also much easier to
   QA for each version.)

2. There have been occasional bugs in carrying around the
   documentation of derivations between documentation files
   and comment fields. You saw such an example precisely
   for Derived_Ignorable_Code_Point in Unicode 5.0, which
   is why we stuck up an erratum for the text -- but the
   list of actual code points in the data file was correct.

3. The UTC wants the option to refactor the derivations,
   if new contributory properties are defined, for example.
   This is much easier to do if the *lists* are normative --
   the derivation then can be jiggered until it produces
   the right result, rather than the other way round.

> 
> Do you have such rules?

So the rule is: The *list* is normative and is always
right (once published for a particular version).
If there is a mismatch with a statement of rules for
derivation for a particular property, then the
statement is wrong.

If the statement is wrong in this sense, but the *intent*
of the statement was correct, then the UTC will address
the problem in the *next* version, and update either the
list and/or the statement until they match again. But
that only happens in a *next* version. There can never
be a question about a existing, published version of
the UCD -- the list is always definitive for that version.

--Ken

> 
>     Patrik
> 
>