"This case isn't the important one" (was Re: Visually confusable characters (8))

Mon Aug 11 16:42:59 CEST 2014

On Sun, Aug 10, 2014 at 10:20:02PM -0700, Mark Davis ?️ wrote:
> No knowledgeable person ever represented that NFKC would remove all
> confusable characters. 

Even if I spelled that "NFC" to match what I wrote, I cannot see how
you get from what I wrote to "remove all confusable characters".
Nobody has suggested that is the goal, and I'm getting a little
impatient with the tendency of some in this discussion to misrepresent
what those of us who have a concern are saying.

To state it again, my concern -- my only concern -- is that this
addition to Unicode appears to be a case where a precomposed
character, which was previously possible to create (for some value of
"possible" and "create") with a combining sequence, is added without
NFC causing the new character and the previous combining sequence to
match.  That behaviour is surprising to me given what I understood at
the time we worked on and published IDNA2008.  (It is in fact
surprising to me even now when I read the text of the standard, but I
understand the argument that in fact the new character is somehow
unrelated enough to the former combining sequence that the combining
sequence never really worked, but that doesn't matter.  I would
probably find that argument more compelling if I understood why this
case is different from ö in Swedish vs. ö in German, but never mind
that, either.)

What is important at least for me now is to understand the extent to
which this sort of thing happens, what our expectation ought to be in
the future about its recurrence, and what implications that has for
how we build network protocols atop Unicode.

Best regards,

A

-- 
Andrew Sullivan
ajs at anvilwalrusden.com