IAB Statement on Identifiers and Unicode 7.0.0

Andrew Sullivan ajs at anvilwalrusden.com
Wed Jan 28 21:48:20 CET 2015

On Wed, Jan 28, 2015 at 07:42:07PM +0000, Shawn Steele wrote:

>  Exactly where the misunderstanding lies.  From an IETF point of
> view it was possible to combine X and Q and to make something that
> looked like Z.  However Z != X + Q, so a different character was
> added "Z".  There never was a way to create U+08A1 before because
> it's not the same thing as what IETF thinks you can do to create it.

I think we all understand that, in the sense of Unicode abstract
characters.  In fact, that is true by definition: the very fact that
UTC has created a separate, canonically different code point means by
definition that the character is a different abstract character.

What some of us at least are trying to say is that, whereas in many
cases we can in fact find a code point property that allows us to make
the distinction, in this case there _isn't_ such a property.  That has
indeed surprised us, and also it turns out to have implications for
the way we'd been developing protocols.

Again, nobody is trying to say that UTC has this wrong or is confused
or anything.  We're simply focussed on the immediate-term problem
that, for the kind of thing we were trying to do with these
characters, they don't work the way we thought they did.  To me, it
will be better if we work together to find a way to get the
information we need to avoid such misunderstandings in future.

Best regards,


Andrew Sullivan
ajs at anvilwalrusden.com

More information about the Idna-update mailing list