browser behavioral differences for IDNA

Wed Oct 28 20:43:55 CET 2009

IE doesn’t do > 3.2 because that’s when our IDN tables were built☺  If we rebuild for IDN 2008, then those Chinese mappings will change to the updated mappings.  (Of course we need an approved RFCs and UTS first for IDNA2008 :-)

The reason why IE behaves like Corrigendum #5 is because I read the intent of UTR15 and didn’t focus on D2 specifically, thus accidentally avoiding the bug in D2.  (Eg: I got lucky).  We haven’t updated to support #5 because we never had the bug.  Though one can argue that it was a bug in my implementation that I didn’t have the bug in the standard, I choose to believe they cancelled out ;-)

Also IE depends on the OS APIs (IdnToAscii(), etc.), so theoretically it’s possible that it could differ on different systems.  (Although they all come from the same place, so for now at least they are all the same.)

-Shawn

From: idna-update-bounces at alvestrand.no<mailto:idna-update-bounces at alvestrand.no> [mailto:idna-update-bounces at alvestrand.no]<mailto:[mailto:idna-update-bounces at alvestrand.no]> On Behalf Of Markus Scherer
Sent: Wednesday, October 28, 2009 5:49 AM
To: Martin J. Dürst
Cc: Mark Davis ☕; IDNA update work; Unicode
Subject: Re: browser behavioral differences for IDNA

On Wed, Oct 28, 2009 at 6:56 PM, "Martin J. Dürst" <duerst at it.aoyama.ac.jp<mailto:duerst at it.aoyama.ac.jp>> wrote:
a) For each corrigendum, at least some of the implementations seem to have adopted it. I very much hope the others will follow. On the IDNAbis WG list, some people pointed out that if there were RFC errat for the Unicode corrigenda, that would help implementations. I can submit some errata if people thing that indeed will help.

b) The entry "FF - 3.2 -- applied twice!" confirms what I have been claiming since the very time the Normalization Idempotency bug was found: That the IDNA spec (implicilty) assumed that normalization was idempotent, and than different implementations might end up with applying normalization once or twice, and thus differ in their result on those cases where (before the corrigendum) normalization wasn't idempotent. All the more reason to apply this corrigendum via an RFC erratum.

I plan to change ICU soon. Currently, ICU's IDNA code uses a specific, internal flag to make the normalization code behave like before Corrigendum #5 while the regular ICU normalization API works as specified since Corrigendum #5. This fork in the code path is annoying, and the bug nasty although extremely obscure. The different browser behaviors have convinced me that it does not make sense to maintain the old behavior.

The reason that ICU's IDNA implementation explicitly behaves like before Corrigenda #4 & #5 is that, at the time, IETF people wanted normalization behavior to be super-compatible with Unicode 3.2 as originally specified, and we assumed that other IDNA implementations would not update either.

Personally, I would be happy to see errata for the IDNA RFCs for adopting the normalization corrigenda.

BTW, what is Opera doing?

Opera implements Corrigendum #5, like Internet Explorer. I don't know about #4.

Best regards,
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091028/95960856/attachment.htm