Data on changes under mapping

Tue Jul 14 02:55:10 CEST 2009

I ran a test to determine the differences between the IDNA2003 mappings and
the proposed mappings. The results are at:

http://www.macchiato.com/unicode/idna/mapping-differences

*Comments on the data:*

Based on the information on character frequencies, I think we could live
with doing the mapping based on case+width (that is, excluding other NFKC
forms). That would leave the only open issues in the mapping being the
sigma/eszett, and SHOULD vs MUST.

Note, however, that it is worth reviewing the lists to see if other cases
pop out. The one I posted has the top 10 characters in each group by
frequency. I can post the full list, if anyone would make use of it for
review.

*Comments on the mapping doc:*

   1. The fact that NFC MUST go last has already been discussed.
   2. The exclusion to dt=narrow or dt=wide works, and it gets the most
   common characters that have different mappings. However, the characters need
   to be transformed not to their decomposition mapping, but to their NFKC
   form. The decomposition mapping must be applied recursively to have the
   correct results. Now, for the current characters under discussion, it makes
   no difference. But we can't say for future characters.
   3. Just like in TABLES, there should be an Exception list of mappings,
   currently empty. This allows for future-proofing, including grandfathered
   mappings for the future, and possible idempotence fixes.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090713/70990763/attachment.htm