looking up domain names with unassigned code points

Shawn Steele Shawn.Steele at microsoft.com
Mon May 12 18:30:32 CEST 2008

> No. For example, xn--en32g would produce U+110000, which is outside
> the range of valid code points. (The highest code point is U+10FFFF.)

> If an app receives such a punycode string, it should not attempt to
> display the corresponding Unicode (since it is invalid). I'm guessing
> that we can all agree on that. :-)

Well, it does indicate that *some* validation of the resulting Unicode string is necessary.  What happens if there's a U+0020 or U+0007 embedded in it?

Note that on the client side it would be required to convert and display the Unicode string if lookup actually succeeds.  xn--asdfasdf isn't acceptable from the "we want our users to know what they're seeing" crowd.

If the client is required to display a successfully resolved string, then there doesn't seem to be much point in disallowing smiley face at this (client) level, since anything with a smiley that resolves would be displayed.  That would put the disallowed character tests at the registration level.

I expected some disagreement with my assertion that some protocols/users will require the Unicode form, so therefore the benefit of looking up punyicode is limited to some specific scenarios, probably leading to inconsistent experiences with "new" names.

- Shawn

More information about the Idna-update mailing list