looking up domain names with unassigned code points

Erik van der Poel erikv at google.com
Sat May 10 17:22:41 CEST 2008


> Given the security fuss with the introduction of IDNA2003, the browsers
> opted to permit only the permitted names and exclude the "illegal"
> ones, which seems like a sensible approach given the negative
> feedback.

When you say "the browsers", which ones do you mean? I tested IE7 and
Firefox2 with the following domain names that are *already* in
Punycode, and IE7 refused to look up the first 3 (did not emit a DNS
packet according to the sniffer), while Firefox2 looked up all of
them:

(1) <a href="http://xn--nza.com/">
(2) <a href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--mgba3a4f16a.ir/">
(3) <a href="http://xn--strae-oqa.com/">
(4) <a href="http://xm--strae-oqa.com/">

(1) has U+03F8 in it (a lower-case letter introduced in Unicode 4.0),
(2) has U+200C (ZWNJ) in it and I found it in the lower left corner of
http://www.nic.ir/List_of_Resellers (this character is being proposed
for IDNA200X) and (3) has U+00DF (Eszett) in it (also discussed
recently).

(4) also has Eszett in it, but the prefix has been changed to "xm--".
(I don't want to introduce another prefix, though.)

> Its also completely unclear to me where the standard says
> that one should assume Punycode is safe and just use it.  On the
> contrary, I recall that there were words disallowing illegal xn--
> constructs that weren't valid punycode (granted Punycode is superset
> of IDNA, but still.)

As far as I know, the only part of RFC 3490 that touches on anomalous
xn-- constructs is steps 3 to 7 of section 4.2:

http://www.ietf.org/rfc/rfc3490.txt

Those steps are part of ToUnicode, which is about display, not lookup.
Does anyone else know of a place in the IDNA2003 RFCs that specifies
whether or not lookup of labels that are already in Punycode is
allowed?

I think IDNA200X should specify whether that is allowed, and give
clear reasons for the choice, so that client implementors don't
second-guess the RFC authors.

>> Now, as we try to accommodate ZWJ and other characters in IDNA2008, we
>> find that we can no longer assume that those LDH characters will
>> guarantee that old software will look up the domain name. In a sense,
>> IE7 missed one of the main points of the design of IDNA2003.
>
> That's sort of irrelevent at this point :)  IE uses the normalization component,
> which I expect to be updated fairly soon after a new spec is written.  Unless
> the new standard goes beyond the Punycode form and mapping/normalization
> steps in 2003, I'm hoping that we can just swap out the component.
>  Of course some users won't get the benefit for some time, but I'm hopeful that
> a large number of users can take advantage of the new standard within a
> reasonable period after its release.

I guess time will tell how smooth the transition from IDNA2003 to IDNA200X is.

Erik


More information about the Idna-update mailing list