IDNAbis compatibility

Mark Davis mark.davis at icu-project.org
Fri Mar 16 00:20:29 CET 2007


Actually, one question that has come up. It appears that in
http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-01.txt no
mappings are being done, thus the "B.1 Commonly mapped to nothing"
characters from rfc3454 are simply illegal. The only ones that would be
mapped to nothing would be the joiners (subject to context).

Is this the intent?

Mark

On 3/15/07, Mark Davis <mark.davis at icu-project.org> wrote:
>
> We did a test run over about a billion documents, looking for hrefs that
> use IDNA, and we got the following information:
>
>   changed by ToUnicode, case variant 117,546  changed by ToUnicode, other
> mapping difference 240,794  unchanged by ToUnicode 1,197,657
> This is a rough proxy for the proportion of IDNs that would become invalid
> under the current proposals for IDNAbis (that is, not using case mappings,
> NFKC, etc.). It is only very rough -- this is preliminary data, and a
> billion documents is a just a sampling of the web. Nor are we looking at
> unmapped characters that would be illegal under IDNAbis.
>
> We'll be doing a more accurate test where we see how many old IDNs in
> hrefs would be invalidated by the change to IDNAbis using the current
> proposed definitions of IDNAbis character sets and mappings, but we thought
> people would like to see the preliminary data, rough as it is.
>
> Mark
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070315/58cf97c0/attachment.html


More information about the Idna-update mailing list