Q2: What mapping function should be used in a revised IDNA2008 specification?
Harald Alvestrand
harald at alvestrand.no
Fri Apr 3 14:44:18 CEST 2009
Mark Davis wrote:
> I modified the program to add a comparison to IDNA2003. I am only
> including cases where the mapping results in A-Label characters. The
> numbers within and across row don't add up as you might expect because
> of various overlaps and because only mappings to A-Label characters
> are counted.
>
> Most of the difference between NFKC-CF-RDI and IDNA2003 are new 5.2
> characters; there are 5 diverging mappings. (As I said before, these
> figures don't include the current list of special cases: eszett,
> final_sigma, joiners.)
>
> PValid or Context: 90262
> IDNA2003, Remapped: 4337
> NFKC-CF-RDI, Remapped: 5291, Diverging: 5
> NFKC-LC-RDI, Remapped: 5225, Diverging: 77
> NFKC-CF, Remapped: 4896, Diverging: 32
> NFKC-LC, Remapped: 4830, Diverging: 104
> NFC-CF-RDI, Remapped: 2486, Diverging: 2663
> NFC-LC-RDI, Remapped: 2395, Diverging: 2754
> NFC-CF, Remapped: 2091, Diverging: 2690
> NFC-LC, Remapped: 2000, Diverging: 2781
>
> Mark
Mark,
these numbers confuse me a bit - what are you counting?
Is this the result of applying (for instance) NFKC(LC(char))) for all
the characters in Unicode, and counting how many got changed?
A number of character sequences (the ones with combining marks being the
most famous ones) are also changed by NFC or NFKC - is there a means of
counting the impact of that?
Harald
More information about the Idna-update
mailing list