looking up domain names with unassigned code points
Shawn Steele
Shawn.Steele at microsoft.com
Mon May 12 18:30:32 CEST 2008
> No. For example, xn--en32g would produce U+110000, which is outside
> the range of valid code points. (The highest code point is U+10FFFF.)
> If an app receives such a punycode string, it should not attempt to
> display the corresponding Unicode (since it is invalid). I'm guessing
> that we can all agree on that. :-)
Well, it does indicate that *some* validation of the resulting Unicode string is necessary. What happens if there's a U+0020 or U+0007 embedded in it?
Note that on the client side it would be required to convert and display the Unicode string if lookup actually succeeds. xn--asdfasdf isn't acceptable from the "we want our users to know what they're seeing" crowd.
If the client is required to display a successfully resolved string, then there doesn't seem to be much point in disallowing smiley face at this (client) level, since anything with a smiley that resolves would be displayed. That would put the disallowed character tests at the registration level.
I expected some disagreement with my assertion that some protocols/users will require the Unicode form, so therefore the benefit of looking up punyicode is limited to some specific scenarios, probably leading to inconsistent experiences with "new" names.
- Shawn
More information about the Idna-update
mailing list