looking up domain names with unassigned code points
Shawn.Steele at microsoft.com
Mon May 12 18:30:32 CEST 2008
> No. For example, xn--en32g would produce U+110000, which is outside
> the range of valid code points. (The highest code point is U+10FFFF.)
> If an app receives such a punycode string, it should not attempt to
> display the corresponding Unicode (since it is invalid). I'm guessing
> that we can all agree on that. :-)
Well, it does indicate that *some* validation of the resulting Unicode string is necessary. What happens if there's a U+0020 or U+0007 embedded in it?
Note that on the client side it would be required to convert and display the Unicode string if lookup actually succeeds. xn--asdfasdf isn't acceptable from the "we want our users to know what they're seeing" crowd.
If the client is required to display a successfully resolved string, then there doesn't seem to be much point in disallowing smiley face at this (client) level, since anything with a smiley that resolves would be displayed. That would put the disallowed character tests at the registration level.
I expected some disagreement with my assertion that some protocols/users will require the Unicode form, so therefore the benefit of looking up punyicode is limited to some specific scenarios, probably leading to inconsistent experiences with "new" names.
More information about the Idna-update