What rules have been used for the current list of codepoints?
Patrik Fältström
patrik at frobbit.se
Thu Dec 21 16:41:05 CET 2006
I have now implemented the below except rule 4 below. The reason for
not implementing rule 4 is because I have not found anywhere I can
find that property in the perl Unicode Libraries. Also, Google
doesn't really help me either, as I see Default_Ignorable_Code_Point
as alias for property "Di" and also some Other_Default...etc. I do
though not find where this is specified. Not in the category Property
at least.
4. If defaultIgnorableCodePoint(cp), remove cp
Can you explain more where this is specified?
So, the rules are these in the new table:
0. Start with the empty set. For each code point cp from 0 to 0x10FFFF:
1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
2. If NFKC(cp) != cp, remove cp
3. If casefold(cp) != cp, remove cp
5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb, Phnx,
Khar, Phag, Glag, Shaw, Dsrt, Osma, Ogam}, remove cp
6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,
Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
N. If cp is in [-A-Z0-9], add cp
You can find the new table at http://stupid.domain.name/idnabis/table-
latest.html
Patrik
More information about the Idna-update
mailing list