What rules have been used for the current list of codepoints?

Kenneth Whistler kenw at sybase.com
Thu Dec 14 23:32:18 CET 2006


To keep up with Mark, I've updated my own table and posted
as:

http://www.unicode.org/~whistler/SPInclusionList061214.txt

The others I mentioned in earlier postings are still there,
if anyone wants to compare. This latest table synchs up
again with Mark by applying the following rule:

4. If defaultIgnorableCodePoint(cp), remove cp

and by adding Runr (Runic) to the exclusion in rule 5.

--Ken

> Thanks Mark!
> 
> I'll do a new table document tomorrow based on this.
> 
>     Patrik
> 
> On 14 dec 2006, at 17.25, Mark Davis wrote:
> 
> > The rules were in the link I sent out, but I'll condense and recap  
> > here:
> >
> > 0. Start with the empty set. For each code point cp from 0 to  
> > 0x10FFFF:
> > 1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
> > 2. If NFKC(cp) != cp, remove cp
> > 3. If casefold(cp) != cp, remove cp
> > 4. If defaultIgnorableCodePoint(cp), remove cp
> > 5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb,  
> > Phnx, Khar,
> > Phag, Glag, Shaw, Dsrt, Runr}, remove cp
> > 6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,
> > Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
> > N. If cp is in [-A-Z0-9], add cp
> >
> > Mark



More information about the Idna-update mailing list