IAB Statement on Identifiers and Unicode 7.0.0

Wed Jan 28 21:02:16 CET 2015

--On Wednesday, January 28, 2015 19:20 +0000 Michel Suignard
<michel at suignard.com> wrote:

>>> ...
>>> If IAB and IETF wants to pursue a path to eliminate these
>>> pseudo  equivalences, they have to build on top of NFC a
>>> more  restrictive  transform.
>> 
>> That is certainly one direction in which we might head but I,
>> and I think several others here, would like to avoid it if
>> possible.
> 
> Isn't that what IDNA2008 does already? As mentioned by Asmus,
> Unicode is already full of 'derived' properties. It is much
> easier to build on top of an existing transform such as NFC
> than to create your own from scratch and it is also easier to
> update when the repertoire is updated.

Agreed.  The question (see the last part of my long note), is
whether the right set of properties are now available to do what
is needed.   I suggest that a different conclusion from the
recent discoveries is that IDNA2008 didn't use quite the right
set of properties (basic or derived) and that at least one
important reason why it didn't is that a sufficient set of
properties were not available.

> To clarify, I was not advocating for a brand new Normalization
> Form build from scratch, but a derived form. I understand that
> it would have been easier to assume that NFC does the trick
> for you, but if you perceive needs that are not addressed by
> NFC, I don't see how you can avoid a solution analog to what
> is done in IDNA 2008, that is NFC + special rules. Then in the
> future IDNA20xx (and any other application with similar need)
> can adopt that transform. 

Speaking personally, I think that is where we are headed (and I
think draft-klensin-idna-5892upd-unicode70-03 includes at least
two variations on that theme in its list of alternatives.  The
difficulty is that, if the "special rules" are going to be based
on Unicode properties, then, unless there are basic or derived
properties I haven't found, there is some new property work that
really should be done in UTC, not IETF (again, see my longer
note).

There is an alternative that some people who have been at this
since before IDNA2003 may remember and that I _really_ hope we
don't need to think about (again).  If reasonable identifiers
--reasonable from a user standpoint, a good user balance between
false negatives and false positives, and so on-- are impossible
given constraints from Unicode from below and the DNS from
above, as a few people seem to be suggesting, then, in
principle, we could abandon IDNs and go back to an "above DNS"
approach.   I wouldn't want to be the one to explain that to the
domain or other communities, even after recognizing that, in
practice, it is what most users, using most browsers, are
experiencing with the web today.

    john