mark.davis at icu-project.org
Fri Mar 16 00:06:55 CET 2007
We did a test run over about a billion documents, looking for hrefs that use
IDNA, and we got the following information:
changed by ToUnicode, case variant 117,546 changed by ToUnicode, other
mapping difference 240,794 unchanged by ToUnicode 1,197,657
This is a rough proxy for the proportion of IDNs that would become invalid
under the current proposals for IDNAbis (that is, not using case mappings,
NFKC, etc.). It is only very rough -- this is preliminary data, and a
billion documents is a just a sampling of the web. Nor are we looking at
unmapped characters that would be illegal under IDNAbis.
We'll be doing a more accurate test where we see how many old IDNs in hrefs
would be invalidated by the change to IDNAbis using the current proposed
definitions of IDNAbis character sets and mappings, but we thought people
would like to see the preliminary data, rough as it is.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update