IDNAbis compatibility

Mark Davis mark.davis at icu-project.org
Fri Mar 16 00:06:55 CET 2007


We did a test run over about a billion documents, looking for hrefs that use
IDNA, and we got the following information:

  changed by ToUnicode, case variant 117,546  changed by ToUnicode, other
mapping difference 240,794  unchanged by ToUnicode 1,197,657
This is a rough proxy for the proportion of IDNs that would become invalid
under the current proposals for IDNAbis (that is, not using case mappings,
NFKC, etc.). It is only very rough -- this is preliminary data, and a
billion documents is a just a sampling of the web. Nor are we looking at
unmapped characters that would be illegal under IDNAbis.

We'll be doing a more accurate test where we see how many old IDNs in hrefs
would be invalidated by the change to IDNAbis using the current proposed
definitions of IDNAbis character sets and mappings, but we thought people
would like to see the preliminary data, rough as it is.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070315/0c1894e3/attachment.html


More information about the Idna-update mailing list