IAB Statement on Identifiers and Unicode 7.0.0

Shawn Steele Shawn.Steele at microsoft.com
Wed Jan 28 23:55:31 CET 2015


> I'm a little confused by your example.  "I" is disallowed
entirely in labels that contain non-ASCII characters.   "İ"

Oops, I was a little (lot?) confused and had a hidden dot I couldn't see and forgot about.  (Not that that makes it any less confusing I suppose).

İ maps to i̇ (U+0060 + U+0307), so a little i with a combining dot above.  Unfortunately i̇ and i are indistinguishable to me, perhaps they differ on your machine.  So it's not exactly the point I was confusingly trying to make, but it does seem to rather be a different confusable point.

> The answer might be different with IDNA2003, but, as Vint has most recently pointed out, making U-labels and A-labels duals of each other, eliminating cases in which ToUnicode(ToASCII(String)) was not equal String, was a major motivation and primary design goal for IDNA2008.

I guess I remember consistency being important, but don't remember ToUnicode(ToASCII(String)) must be == String - it seems like any of the case foldings alone would violate that. Eg: Ô -> ô when testing this rule, and Ô != ô.

-Shawn


More information about the Idna-update mailing list