NFKC and dots

Shawn Steele Shawn.Steele at microsoft.com
Mon Mar 3 20:42:49 CET 2008


I'm a bit confused by the confusion.  RFC 3491 clearly states that KC normalization forms are to be used.  RFC 3490 also clearly says:

3.1...

   1) Whenever dots are used as label separators, the following
      characters MUST be recognized as dots: U+002E (full stop), U+3002
      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
      (halfwidth ideographic full stop).


So its very unclear to me why you'd expect 十․com to be treated as anything but two distinct labels.

ToUnicode(xn--.com-pg0g) would also be expected to fail because xn-- and com-pg0g are clearly separate labels, so I don't see how the xn--.com-pg0g could be "correct".  I could see where, perhaps, you could expect this to fail since it could be ambiguous if you took each RFC in isolation, but it can't have a gibberish value as a "correct" encoding.

- Shawn


More information about the Idna-update mailing list