What rules have been used for the current list of codepoints?

Patrik Fältström patrik at frobbit.se
Thu Dec 21 16:41:05 CET 2006


I have now implemented the below except rule 4 below. The reason for  
not implementing rule 4 is because I have not found anywhere I can  
find that property in the perl Unicode Libraries. Also, Google  
doesn't really help me either, as I see Default_Ignorable_Code_Point  
as alias for property "Di" and also some Other_Default...etc. I do  
though not find where this is specified. Not in the category Property  
at least.

4. If defaultIgnorableCodePoint(cp), remove cp

Can you explain more where this is specified?

So, the rules are these in the new table:

0. Start with the empty set. For each code point cp from 0 to 0x10FFFF:
1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
2. If NFKC(cp) != cp, remove cp
3. If casefold(cp) != cp, remove cp
5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb, Phnx,  
Khar, Phag, Glag, Shaw, Dsrt, Osma, Ogam}, remove cp
6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,  
Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
N. If cp is in [-A-Z0-9], add cp

You can find the new table at http://stupid.domain.name/idnabis/table- 
latest.html

     Patrik



More information about the Idna-update mailing list