Q2: What mapping function should be used in a revised IDNA2008 specification?

Harald Alvestrand harald at alvestrand.no
Fri Apr 3 14:44:18 CEST 2009


Mark Davis wrote:
> I modified the program to add a comparison to IDNA2003. I am only 
> including cases where the mapping results in A-Label characters. The 
> numbers within and across row don't add up as you might expect because 
> of various overlaps and because only mappings to A-Label characters 
> are counted.
>
> Most of the difference between NFKC-CF-RDI and IDNA2003 are new 5.2 
> characters; there are 5 diverging mappings. (As I said before, these 
> figures don't include the current list of special cases: eszett, 
> final_sigma, joiners.)
>
> PValid or Context: 90262
> IDNA2003,    Remapped:    4337
> NFKC-CF-RDI,    Remapped:    5291,    Diverging:    5
> NFKC-LC-RDI,    Remapped:    5225,    Diverging:    77
> NFKC-CF,    Remapped:    4896,    Diverging:    32
> NFKC-LC,    Remapped:    4830,    Diverging:    104
> NFC-CF-RDI,    Remapped:    2486,    Diverging:    2663
> NFC-LC-RDI,    Remapped:    2395,    Diverging:    2754
> NFC-CF,    Remapped:    2091,    Diverging:    2690
> NFC-LC,    Remapped:    2000,    Diverging:    2781
>
> Mark
Mark,

these numbers confuse me a bit - what are you counting?

Is this the result of applying (for instance) NFKC(LC(char))) for all 
the characters in Unicode, and counting how many got changed?

A number of character sequences (the ones with combining marks being the 
most famous ones) are also changed by NFC or NFKC - is there a means of 
counting the impact of that?

                 Harald



More information about the Idna-update mailing list