NFKC and dots

Shawn Steele Shawn.Steele at
Mon Mar 3 20:42:49 CET 2008

I'm a bit confused by the confusion.  RFC 3491 clearly states that KC normalization forms are to be used.  RFC 3490 also clearly says:


   1) Whenever dots are used as label separators, the following
      characters MUST be recognized as dots: U+002E (full stop), U+3002
      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
      (halfwidth ideographic full stop).

So its very unclear to me why you'd expect 十․com to be treated as anything but two distinct labels.

ToUnicode( would also be expected to fail because xn-- and com-pg0g are clearly separate labels, so I don't see how the could be "correct".  I could see where, perhaps, you could expect this to fail since it could be ambiguous if you took each RFC in isolation, but it can't have a gibberish value as a "correct" encoding.

- Shawn

More information about the Idna-update mailing list