"This case isn't the important one" (was Re: Visually confusable characters (8))
ajs at anvilwalrusden.com
Mon Aug 11 21:50:25 CEST 2014
First, thanks for this. Your message clarified a number of things for
On Mon, Aug 11, 2014 at 07:28:50PM +0000, Whistler, Ken wrote:
> U+0529 CYRILLIC LETTER EN WITH LEFT HOOK
> For Orok. For some value of "possible" and some value of "create", that
> was "the same" as the existing U+043D CYRILLIC LETTER EN +
> U+0321 COMBINING PALATALIZED HOOK BELOW (which takes different
> positions, depending on the base letter it attaches to).
Ok, this is helpful. Thanks.
> I am guessing that you have taken on board the Unicode formal non-equivalence
> of these "precomposed" characters that have diacritics attached
> or overlaying the base letters, even though *logically* these
> diacritic modifications are, at another level the "same" as the
> base character plus the application of the combining diacritic
> in question.
No, not quite; this is exactly the problem. While I accept and
believe I understand Unicode's decision that these are formally
non-equivalent, I think that for IDNA purposes that may not be good.
And this non-equivalence is not consistent with what I believed NFC
was supposed to get us for IDNA purposes. That there are now these
other examples (thank you for them) makes the problem worse, not
> In any case, at this point I find it surprising that anyone who had
> been paying close attention to Unicode for the last 15 years or so
> would find the regular (not common, not rare) introduction of these
> kinds of letters into the standard to actually be surprising.
What is surprising to me is the difference between what I thought
happened with normalization in the cases where precombined characters
were to be added, and what actually happens. That isn't the meaning
of, "I am surprised," that is actually a snide way of saying, "You are
wrong." It's just a genuine expression of surprise. It's entirely
possible (it wouldn't surprise me at all) that I was utterly confused
about the way Unicode works. I've been working intimately with the
DNS since the early 2000s, and I continue to be surprised by it too.
Probably others are more clever than I am and are less often mistaken.
> It will recur. These kinds of situations are built into the structure
> of a number of scripts -- most notably Latin, Cyrillic, and Arabic.
> They should not be surprises.
That's good to know. Now we have to figure out what the consequences
are for protocols.
ajs at anvilwalrusden.com
More information about the Idna-update