Q2: What mapping function should be used in a revised IDNA2008 specification?
mark at macchiato.com
Fri Apr 3 18:19:17 CEST 2009
What I do is for every Unicode character C:
- apply the mapping (eg NFKC + CaseFolding + DefaultIgnorableRemoval) to
C getting result R, and if the following are all true, I increment a
1. R != C, and
2. every character in R is a possible U-Label character (PVALID or
CONTEXTJ or CONTEXTO).
- apply the IDNA2003 mapping to C getting R', and if the following are
all true, I increment a Diverging counter:
- IDNA2003 succeeded (there is an R')
- R' != R, and
- every character in R' is a possible U-Label character
On Fri, Apr 3, 2009 at 05:44, Harald Alvestrand <harald at alvestrand.no>wrote:
> Mark Davis wrote:
>> I modified the program to add a comparison to IDNA2003. I am only
>> including cases where the mapping results in A-Label characters. The numbers
>> within and across row don't add up as you might expect because of various
>> overlaps and because only mappings to A-Label characters are counted.
>> Most of the difference between NFKC-CF-RDI and IDNA2003 are new 5.2
>> characters; there are 5 diverging mappings. (As I said before, these figures
>> don't include the current list of special cases: eszett, final_sigma,
>> PValid or Context: 90262
>> IDNA2003, Remapped: 4337
>> NFKC-CF-RDI, Remapped: 5291, Diverging: 5
>> NFKC-LC-RDI, Remapped: 5225, Diverging: 77
>> NFKC-CF, Remapped: 4896, Diverging: 32
>> NFKC-LC, Remapped: 4830, Diverging: 104
>> NFC-CF-RDI, Remapped: 2486, Diverging: 2663
>> NFC-LC-RDI, Remapped: 2395, Diverging: 2754
>> NFC-CF, Remapped: 2091, Diverging: 2690
>> NFC-LC, Remapped: 2000, Diverging: 2781
> these numbers confuse me a bit - what are you counting?
> Is this the result of applying (for instance) NFKC(LC(char))) for all the
> characters in Unicode, and counting how many got changed?
> A number of character sequences (the ones with combining marks being the
> most famous ones) are also changed by NFC or NFKC - is there a means of
> counting the impact of that?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update