looking up domain names with unassigned code points

Erik van der Poel erikv at google.com
Sat May 10 21:17:50 CEST 2008


Thanks for responding. I'm not sure what the right answer is, either.
Yes, I was referring to domain names and URIs that are already in
Punycode form, and I agree that the situation in which the app
receives the Unicode form is very different (primarily because of the
unassigned code point issue).

I also agree with your suggestions below. My main concern with simply
letting apps look up domain names that are already in Punycode form is
that some apps may also blindly convert the Punycode to Unicode for
display, without checking for dangerous characters like U+2044
FRACTION SLASH. The TLD registries are under a certain amount of
pressure to only register "safe" names, but at lower levels of the
DNS, there is very little pressure and practically zero enforcement.

However, I don't know how comfortable you and others in the working
group are about writing advice regarding display issues in the


On Sat, May 10, 2008 at 11:46 AM, John C Klensin <klensin at jck.com> wrote:
> Erik (and others),
> I've been silent on this because I'm not sure what the right
> answer is.  Just to be sure, we are talking _only_ about
> situations in which the domain name (or URI) presented to the
> application is already in Punycode form (i.e., it is a putative
> A-label), and not something that is to be converted to an
> A-label by that application.   I believe that the situation in
> which a Unicode string is presented to the application is _very_
> different.
> That uncertainty is driven by two conclusions:
>        * If we were to insist that the punycode form be checked
>        to determine whether it contains unassigned (or
>        DISALLOWED) code points, there is no possible way that
>        IDNA-unaware applications could comply.  Those
>        applications simply don't know that the punycode string
>        is anything but an LDH domain name and there is no way
>        that we can specify that a subset of those domain names
>        be processed in some special way.
>        * For IDNA-aware applications, I believe that,
>        regardless of what we say, some application implementers
>        are going to do what they think best of their users.
>        There are strong arguments in both directions, driven by
>        user safety, performance, code sequence patterns,
>        assumptions about how frequently updates will occur, and
>        a collection of other considerations, many (or most) of
>        which have already been discussed on this list.
> Since I believe that facing reality is generally good, I suggest
> that we:
>        * Add a section to "protocol" that discusses this case.
>        * Specify that the application MAY convert the putative
>        A-label to a U-label, make the check, and reject if
>        UNASSIGNED or DISALLOWED characters are found.
>        * Discuss the tradeoffs as advice about how applications
>        should make the decision.
> Is that plausible?  I think is is consistent with several of the
> suggestions that have been made, especially those that say that
> this one is ultimately an implementation decision.
>    john
> --On Saturday, May 10, 2008 8:22 AM -0700 Erik van der Poel
> <erikv at google.com> wrote:
>>> Given the security fuss with the introduction of IDNA2003,
>>> the browsers opted to permit only the permitted names and
>>> exclude the "illegal" ones, which seems like a sensible
>>> approach given the negative feedback.
>> When you say "the browsers", which ones do you mean? I tested
>> IE7 and Firefox2 with the following domain names that are
>> *already* in Punycode, and IE7 refused to look up the first 3
>> (did not emit a DNS packet according to the sniffer), while
>> Firefox2 looked up all of them:
>> (1) <a href="http://xn--nza.com/">
>> (2) <a
>> href="http://xn--ngb7d.xn--mgbbgcw7khi2840d.xn--mgba3a4f16a.ir
>> /"> (3) <a href="http://xn--strae-oqa.com/">
>> (4) <a href="http://xm--strae-oqa.com/">
>> (1) has U+03F8 in it (a lower-case letter introduced in
>> Unicode 4.0), (2) has U+200C (ZWNJ) in it and I found it in
>> the lower left corner of http://www.nic.ir/List_of_Resellers
>> (this character is being proposed for IDNA200X) and (3) has
>> U+00DF (Eszett) in it (also discussed recently).
>> (4) also has Eszett in it, but the prefix has been changed to
>> "xm--". (I don't want to introduce another prefix, though.)
>>> Its also completely unclear to me where the standard says
>>> that one should assume Punycode is safe and just use it.  On
>>> the contrary, I recall that there were words disallowing
>>> illegal xn-- constructs that weren't valid punycode (granted
>>> Punycode is superset of IDNA, but still.)
>> As far as I know, the only part of RFC 3490 that touches on
>> anomalous xn-- constructs is steps 3 to 7 of section 4.2:
>> http://www.ietf.org/rfc/rfc3490.txt
>> Those steps are part of ToUnicode, which is about display, not
>> lookup. Does anyone else know of a place in the IDNA2003 RFCs
>> that specifies whether or not lookup of labels that are
>> already in Punycode is allowed?
>> I think IDNA200X should specify whether that is allowed, and
>> give clear reasons for the choice, so that client implementors
>> don't second-guess the RFC authors.
>>>> Now, as we try to accommodate ZWJ and other characters in
>>>> IDNA2008, we find that we can no longer assume that those
>>>> LDH characters will guarantee that old software will look up
>>>> the domain name. In a sense, IE7 missed one of the main
>>>> points of the design of IDNA2003.
>>> That's sort of irrelevent at this point :)  IE uses the
>>> normalization component, which I expect to be updated fairly
>>> soon after a new spec is written.  Unless the new standard
>>> goes beyond the Punycode form and mapping/normalization steps
>>> in 2003, I'm hoping that we can just swap out the component.
>>>  Of course some users won't get the benefit for some time,
>>>  but I'm hopeful that a large number of users can take
>>> advantage of the new standard within a reasonable period
>>> after its release.
>> I guess time will tell how smooth the transition from IDNA2003
>> to IDNA200X is.
>> Erik
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list