IAB Statement on Identifiers and Unicode 7.0.0

Thu Jan 29 04:02:59 CET 2015

--On Thursday, January 29, 2015 01:02 +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> I'm confused.  Mark seems to be very clearly saying that
> "there are more egregious ambiguities in IDNA, but the
> discussion is spending a lot of energy about the 1% case
> rather than the bigger problems."

Shawn,

This is getting tedious.  Vint has explained, Andrew has
explained, Pete has explained, Patrik has explained, and I have
explained, each in different ways (and my apologies to anyone
I've left out), that the examples in the IAB statement are not
the problem.  They are symptoms of what may be a fundamental
misunderstanding in the IDNA design (specifically that we may
not have used the right set of properties) and perhaps an even
more fundamental one (specifically that the necessary set of
properties may not exist or be complete).

A stray problem code point (or three or 10) would not really be
a fundamental issue.  I'd rather deal with them than say "there
are lots of other problems, so we can ignore this one too... or
all of them" but we could quite easily deal with them, worst
case, by creating an exception list somewhere.  I don't know
that I'd feel the same way about a thousand or so code points
(which would be the 1% case if you didn't mean that entirely
metaphorically), but this isn't the issue either.

The thing that is producing this very high level of concern is a
belief that we designed the rule and property structure of IDNA
on the basis of a series of assumptions, one of which was that
NFC would resolve virtually all of the different ways of coding
a single grapheme cluster within a given script (see Andrew's
note and others for the relevant disclaimers).  We did not make
that assumption up but were advised that things worked that way.
Another was that the guidelines and stability rules in the
standard could be taken seriously, another subject on which we
received strong advice.

If those assumptions are seriously wrong and correspond to
real-world cases, then we need to do something about it.  See
either Andrew's list from (I think) earlier today or the list in
the I-D for possibilities; both lists include deciding that the
problem really isn't worth dealing with, but that needs to be a
decision.  But, again, and for what feels like the 20th time,
the issue isn't a list of characters, it is what do to about
apparent defects in the IDNA rule and derived property structure.

Finally, since I'm trying to find time to work on
draft-klensin-idna-5892upd-unicode70-04 to create a version that
reflects our best current understanding, I feel a need to
comment on your assertion that I-D clearly did not have
consensus.   The document essentially consists of two parts.
One explains the problem.  Based on discussions over the last
week or so, that explanation and the dimensions of the problem
it describes are clearly (sic) inadequate, but no one has just
said, as far as they go, they are wrong.  The other part
identifies several possible approaches to the problem without
recommending any of them.  So I don't even know what it would
mean to have consensus on that document, much less what your
assertion that no consensus exists means.  I am left with some
doubt that you actually read it but, if you have not, please
wait for -04 which will at least have a more complete problem
description.

best,
    john