What rules have been used for the current list of codepoints?

Patrik Fältström patrik at frobbit.se
Thu Dec 14 22:53:35 CET 2006


Thanks Mark!

I'll do a new table document tomorrow based on this.

    Patrik

On 14 dec 2006, at 17.25, Mark Davis wrote:

> The rules were in the link I sent out, but I'll condense and recap  
> here:
>
> 0. Start with the empty set. For each code point cp from 0 to  
> 0x10FFFF:
> 1. If generalCategory(cp) is in {Ll, Lu, Lo, Lm, Mn, Mc, Nd}, add cp
> 2. If NFKC(cp) != cp, remove cp
> 3. If casefold(cp) != cp, remove cp
> 4. If defaultIgnorableCodePoint(cp), remove cp
> 5. If script(cp) in {Xsux, Ugar, Xpeo, Goth, Ital, Cprt, Linb,  
> Phnx, Khar,
> Phag, Glag, Shaw, Dsrt, Runr}, remove cp
> 6. If block(cp) in {Combining_Diacritical_Marks_for_Symbols,
> Musical_Symbols, Ancient_Greek_Musical_Notation}, remove cp
> N. If cp is in [-A-Z0-9], add cp
>
> Mark
>
> On 12/14/06, Patrik Fältström <patrik at frobbit.se> wrote:
>>
>> On 14 dec 2006, at 03.21, Kenneth Whistler wrote:
>>
>> > Mark suggested:
>> >
>> >    - We've been forgetting to remove default-ignorable-code-points,
>> > so I
>> >    added an exclusion. It only affects variation selectors.
>> >
>> > I concur with that. It was going to be my next suggestion to pare
>> > away. I had neglected to spot them right away because I had
>> > already omitted printing out anything from Plane 14.
>>
>> Can one of Ken and Mark please post the new algorithm based on these
>> latest additions? Based on either Marks or my rules?
>>
>>     Patrik
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
>
>
> -- 
> Mark



More information about the Idna-update mailing list