IDNAbis compatibility
Mark Davis
mark.davis at icu-project.org
Fri Mar 16 00:20:29 CET 2007
Actually, one question that has come up. It appears that in
http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-01.txt no
mappings are being done, thus the "B.1 Commonly mapped to nothing"
characters from rfc3454 are simply illegal. The only ones that would be
mapped to nothing would be the joiners (subject to context).
Is this the intent?
Mark
On 3/15/07, Mark Davis <mark.davis at icu-project.org> wrote:
>
> We did a test run over about a billion documents, looking for hrefs that
> use IDNA, and we got the following information:
>
> changed by ToUnicode, case variant 117,546 changed by ToUnicode, other
> mapping difference 240,794 unchanged by ToUnicode 1,197,657
> This is a rough proxy for the proportion of IDNs that would become invalid
> under the current proposals for IDNAbis (that is, not using case mappings,
> NFKC, etc.). It is only very rough -- this is preliminary data, and a
> billion documents is a just a sampling of the web. Nor are we looking at
> unmapped characters that would be illegal under IDNAbis.
>
> We'll be doing a more accurate test where we see how many old IDNs in
> hrefs would be invalidated by the change to IDNAbis using the current
> proposed definitions of IDNAbis character sets and mappings, but we thought
> people would like to see the preliminary data, rough as it is.
>
> Mark
>
--
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070315/58cf97c0/attachment.html
More information about the Idna-update
mailing list