"This case isn't the important one" (was Re: Visually confusable characters (8))
Shawn.Steele at microsoft.com
Mon Aug 11 20:31:33 CEST 2014
A) From a purely binary standpoint no mapping was added, which is pretty much what normalization guaranteed, that no binary mappings would change.
B) Linguistic experts have indicated that, despite the confusing name, this is not the same character. I don’t know the language, so I have to defer to those in the UTC that understand it better than I. However I have participated in the process and discussion and have am confident that smart people reached this decision.
So, in my view, nothing’s changed or broken WRT IDN’s use of normalization. Yes, another potentially confusing character combination now exists, but we already have thousands of homographs. So I’m quite confused what the concern is since the mappings weren’t broken (at the logical level) and it’s only a homograph (which we already have to deal with).
To state it again, my concern -- my only concern -- is that this
addition to Unicode appears to be a case where a precomposed
character, which was previously possible to create (for some value of
"possible" and "create") with a combining sequence, is added without
NFC causing the new character and the previous combining sequence to
match. That behaviour is surprising to me given what I understood at
the time we worked on and published IDNA2008. (It is in fact
surprising to me even now when I read the text of the standard, but I
understand the argument that in fact the new character is somehow
unrelated enough to the former combining sequence that the combining
sequence never really worked, but that doesn't matter. I would
probably find that argument more compelling if I understood why this
case is different from ö in Swedish vs. ö in German, but never mind
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update