"This case isn't the important one" (was Re: Visually confusable characters (8))

Andrew Sullivan ajs at anvilwalrusden.com
Mon Aug 11 21:50:25 CEST 2014

Hi Ken,

First, thanks for this.  Your message clarified a number of things for

On Mon, Aug 11, 2014 at 07:28:50PM +0000, Whistler, Ken wrote:
> For Orok.  For some value of "possible" and some value of "create", that
> was "the same" as the existing U+043D CYRILLIC LETTER EN + 
> U+0321 COMBINING PALATALIZED HOOK BELOW (which takes different
> positions, depending on the base letter it attaches to). 

Ok, this is helpful.  Thanks.

> I am guessing that you have taken on board the Unicode formal non-equivalence
> of these "precomposed" characters that have diacritics attached
> or overlaying the base letters, even though *logically* these
> diacritic modifications are, at another level the "same" as the
> base character plus the application of the combining diacritic
> in question.

No, not quite; this is exactly the problem.  While I accept and
believe I understand Unicode's decision that these are formally
non-equivalent, I think that for IDNA purposes that may not be good.
And this non-equivalence is not consistent with what I believed NFC
was supposed to get us for IDNA purposes.  That there are now these
other examples (thank you for them) makes the problem worse, not
> In any case, at this point I find it surprising that anyone who had
> been paying close attention to Unicode for the last 15 years or so
> would find the regular (not common, not rare) introduction of these
> kinds of letters into the standard to actually be surprising.

What is surprising to me is the difference between what I thought
happened with normalization in the cases where precombined characters
were to be added, and what actually happens.  That isn't the meaning
of, "I am surprised," that is actually a snide way of saying, "You are
wrong."  It's just a genuine expression of surprise.  It's entirely
possible (it wouldn't surprise me at all) that I was utterly confused
about the way Unicode works.  I've been working intimately with the
DNS since the early 2000s, and I continue to be surprised by it too.
Probably others are more clever than I am and are less often mistaken.
My apologies.

> It will recur. These kinds of situations are built into the structure
> of a number of scripts -- most notably Latin, Cyrillic, and Arabic.
> They should not be surprises.

That's good to know.  Now we have to figure out what the consequences
are for protocols.

Best regards,


Andrew Sullivan
ajs at anvilwalrusden.com

More information about the Idna-update mailing list