looking up domain names with unassigned code points

Erik van der Poel erikv at google.com
Sat May 10 03:25:02 CEST 2008


>> Yup, and thanks to IE7's refusing to lookup punycode labels that
>> encode ZWJ, it is now difficult to introduce ZWJ in IDNA2008 without
>> adding a new prefix (in addition to the existing xn--). Maybe you
>> missed that discussion too. Anyway, I'm not suggesting that we add a
>> new prefix to the mix -- I'm just saying that IE7's implementation is
>> now making things a bit more difficult than they would otherwise be.
>
> That doesn't seem like the only thing that's broken in that scenario.
>  The user couldn't read the resulting URL, etc, and we've been trying
> to tell them to be wary of URLs that look like gibberish.  Also I'm a
> bit confused because I thought the idea of the query was to query
> for *new* Unicode characters, not those that were already decided
> to be illegal in a name.

Yes, I added a related topic to the discussion. (That of ZWJ, which is
not an unassigned code point.) However, there is a common theme here
-- that of allowing labels that are *already* in punycode to be looked
up.

One of the important benefits of sticking to the LDH set when Punycode
was initially designed was that existing software would continue to
"work" -- i.e. look up domain names, even if the display was
suboptimal.

Now, as we try to accommodate ZWJ and other characters in IDNA2008, we
find that we can no longer assume that those LDH characters will
guarantee that old software will look up the domain name. In a sense,
IE7 missed one of the main points of the design of IDNA2003.

> The biggest problem is probably "just" that IDNA2003 was a v1,
> so there're now a few kinks to work out, and its so far behind
> Unicode 5.1.  If it could be brought to a somewhat current
> version of Unicode, then querying for additional characters
> wouldn't be as interesting.  (Because the OS still needs
> font support and all that as well).

Yes, I agree that Unicode 5.1 is a huge step from Unicode 3.2, and
that future versions of Unicode will in some sense be making smaller
and smaller steps in terms of characters that are frequently and
currently used.

However, this is not the point I'm trying to make. I'm saying that all
clients should at least look up any label that is already in Punycode,
so that it will be easier to make future changes to IDNA, such as
I<heart>NY, which may not be deemed problematic in the future. The
argument that people cannot type such characters has begun to carry
less meaning now that many users click on search results.

Erik


More information about the Idna-update mailing list