Unicode 7.0.0, (combining) Hamza Above, and normalization

Andrew Sullivan ajs at anvilwalrusden.com
Fri Aug 8 02:27:45 CEST 2014

On Thu, Aug 07, 2014 at 10:56:51PM +0000, Whistler, Ken wrote:
> The linguistic grounds are now basically irrelevant to the *current*
> discussion. My assertion is that U+08A1 beh-with-hamza as *NOT*
> the same as the sequence beh + combining Hamza. And that assertion
> can be derived from the decisions and the data published by the UTC
> about the encoding.

I think everyone agrees on this, because it's very close to a
tautology: they're not the same character by definition, because they
don't normalize to the same thing and thet're not the same code
points.  For practical purposes, that fact about the world doesn't
really clarify matters..

> All of this discussion seems to be boiling down to IETF second-guessing
> of Unicode character encoding decisions and complaints about Unicode
> normalization not satisfying expectations based on rather simplistic
> notions of which things that look the same should *be* the same.

I don't think that's a fair characterization.  Nobody is
"second-guessing" anything.  It's rather that we -- John, actually --
discovered that there's a consequence of this case that we did not
previously understand, and it has uncomfortable consequences for the
way we had previously relied on Unicode, because it didn't work the
way we thought.  That's hardly surprising, but it's an important new
discovery and we have to understand the consequences of it.

> for IDNA because of a one-off quibble about encoding decisions
> made by the UTC and normalization just *increases* the overall
> complexity and level of confusion about application of the protocol.

Well, maybe and maybe not.  Some of the users of this protocol are
naïve users of it -- they don't even know they're using a protocol.
It might be (I don't yet have an opinion) that doing things in a way
that is less likely to lead to attacks against those people is worth
making either the protocol or the protocol-implementation advice more
complicated.  Presumably, implementers have a greater reason to become
familiar with the picky exceptional cases.

Best regards,


Andrew Sullivan
ajs at anvilwalrusden.com

More information about the Idna-update mailing list