NFKC and dots
Shawn Steele
Shawn.Steele at microsoft.com
Mon Mar 3 20:42:49 CET 2008
I'm a bit confused by the confusion. RFC 3491 clearly states that KC normalization forms are to be used. RFC 3490 also clearly says:
3.1...
1) Whenever dots are used as label separators, the following
characters MUST be recognized as dots: U+002E (full stop), U+3002
(ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
(halfwidth ideographic full stop).
So its very unclear to me why you'd expect 十․com to be treated as anything but two distinct labels.
ToUnicode(xn--.com-pg0g) would also be expected to fail because xn-- and com-pg0g are clearly separate labels, so I don't see how the xn--.com-pg0g could be "correct". I could see where, perhaps, you could expect this to fail since it could be ambiguous if you took each RFC in isolation, but it can't have a gibberish value as a "correct" encoding.
- Shawn
More information about the Idna-update
mailing list