What rules have been used for the current list of codepoints?

Harald Alvestrand harald at alvestrand.no
Wed Dec 13 23:08:42 CET 2006



--On 13. desember 2006 08:45 -0800 Kenneth Whistler <kenw at sybase.com> wrote:

> I have no idea how consensus on this list is measured, but
> *I* am absolutely sure that Lm and Nd need to be added. In
> fact, using the formulation you are using here for rules,
> the whole list of rules should be reconstructed as:
>
> 1. If class is [Ll, Lm, Lo, Mn, Mc, Nd], the code point is ok
> 2. If NFKC(cp) != cp, the code point is not ok
> 3. If lowercase(cp) != cp, the code point is not ok
>
> And that is pretty much exactly what I stated in the November 30
> contribution.
>

Since a character can match rule 1 and also match rule 2 or 3, you have to 
apply rule 1 last.

Your list includes LATIN SMALL LETTER SHARP S - I thought that was unstable 
under NFKC+casemap?

And personally, I think a rule that permits COMBINING ALMOST EQUAL TO ABOVE 
and MUSICAL SYMBOL COMBINING TREMOLO-1 is useless for this exercise.

               Harald





More information about the Idna-update mailing list