What rules have been used for the current list of codepoints?

Mark Davis markdavis at google.com
Thu Dec 14 17:25:00 CET 2006


The rules were in the link I sent out, but I'll condense and recap here:

0. Start with the empty set. For each code point cp from 0 to 0x10FFFF:
1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
2. If NFKC(cp) != cp, remove cp
3. If casefold(cp) != cp, remove cp
4. If defaultIgnorableCodePoint(cp), remove cp
5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb, Phnx, Khar,
Phag, Glag, Shaw, Dsrt, Runr}, remove cp
6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,
Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
N. If cp is in [-A-Z0-9], add cp

Mark

On 12/14/06, Patrik Fältström <patrik at frobbit.se> wrote:
>
> On 14 dec 2006, at 03.21, Kenneth Whistler wrote:
>
> > Mark suggested:
> >
> >    - We've been forgetting to remove default-ignorable-code-points,
> > so I
> >    added an exclusion. It only affects variation selectors.
> >
> > I concur with that. It was going to be my next suggestion to pare
> > away. I had neglected to spot them right away because I had
> > already omitted printing out anything from Plane 14.
>
> Can one of Ken and Mark please post the new algorithm based on these
> latest additions? Based on either Marks or my rules?
>
>     Patrik
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061214/365bd0bb/attachment.html


More information about the Idna-update mailing list