IAB Statement on Identifiers and Unicode 7.0.0
Shawn Steele
Shawn.Steele at microsoft.com
Wed Jan 28 23:55:31 CET 2015
> I'm a little confused by your example. "I" is disallowed
entirely in labels that contain non-ASCII characters. "İ"
Oops, I was a little (lot?) confused and had a hidden dot I couldn't see and forgot about. (Not that that makes it any less confusing I suppose).
İ maps to i̇ (U+0060 + U+0307), so a little i with a combining dot above. Unfortunately i̇ and i are indistinguishable to me, perhaps they differ on your machine. So it's not exactly the point I was confusingly trying to make, but it does seem to rather be a different confusable point.
> The answer might be different with IDNA2003, but, as Vint has most recently pointed out, making U-labels and A-labels duals of each other, eliminating cases in which ToUnicode(ToASCII(String)) was not equal String, was a major motivation and primary design goal for IDNA2008.
I guess I remember consistency being important, but don't remember ToUnicode(ToASCII(String)) must be == String - it seems like any of the case foldings alone would violate that. Eg: Ô -> ô when testing this rule, and Ô != ô.
-Shawn
More information about the Idna-update
mailing list