I-D Action:draft-faltstrom-5892bis-02.txt

Tue Feb 22 15:29:18 CET 2011

--On Monday, February 21, 2011 22:34 +0100 Simon Josefsson
<simon at josefsson.org> wrote:

>...
> Again, what practical incompatibility is there in following
> Mark's proposal?  An illustration would go a long way to
> convince me here, as I believe Mark has illustrated that by
> _not_ adding another exception to the list of exceptions, we
> will create two incompatible IDNA2008 algorithms: IDNA2008
> with Unicode 5.2/6.0 vs IDNA2008-with-RFC5892bis with Unicode
> 6.0.

Simon,

Vint and Patrik have covered part of what I would have said had
I gotten to this earlier and I think Martin's observation is
very important.   Patrik has said what I'm going to say below
from a slightly different viewpoint.  Vint, especially, explains
the motivation for different views of predictability.  Keeping
in mind that the Internet continues to grow and reach more
diverse populations and especially for characters from scripts
that haven't been used a lot in the past, predictability and
character-list stability may not be the same thing... and
intuitive predictability may be more important.

That said, let me try to focus narrowly on what you asked me to
explain, in slightly different terms, what I tried to say before.

If one considers the standard in terms of stability of normative
rules (stability that inherently includes any exception lists),
then one gets maximum stability and predictability by not making
changes to the rules (with the exception list being part of the
rules even if not the algorithm).   That is what the current
version of Patrik's draft does.   Doing things that way has the
predictability and accuracy properties that Vint discussed; it
is also in the spirit of the "don't make rule changes, or even
exception list changes, for new versions of Unicode unless
something very dramatic occurs" principle that is a major
element of IDNA2008.  Of course, as Martin and others have
pointed out, applying stable rules to a new version of Unicode
results in changes, if only to moving code points from
UNASSIGNED to PVALID (or other categories) -- a change that is
significant because of the IDNA2008 prohibition on looking up
UNASSIGNED characters.

One of the properties of this algorithmic approach is suggested
by an extrapolation from RFC 6055 (draft-iab-idn-encoding before
today).  Whether one is concerned with premature application of
Punycode (conversion to A-labels) or not, it is undesirable to
have the same code point treated in different ways depending on
where it is encountered in a lookup process.  While, IMO, this
remains an edge case, that is exactly what we do to ourselves if
was say "apply whatever Unicode properties you have for this
character unless it is in an IDNA context, in which case you
also need to examine an override/exception list".

And, fwiw, my personal view is that we should not change this
because the change has not been justified and that it is better
to remain as close to Unicode properties (even corrected ones)
than to blindly make a Section G addition in order to un-do a
correction the Unicode presumably considered important.   We
should not lose track of the fact that, had Unicode not changed
a character property, we wouldn't be having this discussion.
That property was changed to correct an error, but maybe that is
less an issue for IDNA than for Unicode.  The fact that I
believe the particular character is an edge case is relevant to
my thoughts about how much time we should be spending on this,
but is not the reason for my personal conclusion about it.

If one thinks, contrary to the design that is quite explicit in
the IDNA2008 specs and discussions, that the rules are just
guidelines for constructing tables and it is the tables that are
normative, then it is entirely natural to define stability in
terms of compatibility between older tables and newer tables,
with "compatibility" defined as "any value that is assigned
never changes properties".   As far as I can tell, that is the
way Unicode works: whatever rules exist for classifying
characters, they are just guidelines (perhaps strong ones).
What is normative is the tables and lists of properties in those
tables (or, to be more precise, the list of properties in those
tables that have been defined as stable).  It is certainly the
way that IDNA2003 and Stringprep worked.   But it is not the way
IDNA2008 is designed and, IMO, it is not reasonable to apply the
design of another model to IDNA2008 decisions as if that
alternate design were ultimate truth.

Within the IDNA2008 model, it is certainly possible to add
(Section G) exception cases to more nearly simulate the "table
stability" model.   But some of us believe that exception cases
are themselves a bad idea, leading to reduced predictability,
and need to be justified on that basis, not just applied because
of some correction somewhere else.

In either case, the issue isn't "for stability" or "against
stability".  As far as I can tell, we are all "for stability".
We just look at stability --and what needs to be stable--
somewhat differently.  The only complete stability in this
environment is achieved by the IDNA2003 model (or a
possibly-bizarre interpretation of it): build tables based on a
particular version of Unicode, treat those tables as normative,
and treat changes (either to properties of characters already
defined or the addition of new characters) as irrelevant to
IDNA... forever or until the community is willing to tolerate a
really major change.

For whatever it is worth, I think the WG understood the
tradeoffs involved between the two models when IDNA2008 was
being designed and approved.  As Mark has pointed out, he (and
others) suggested several arrangements to the WG that would have
guaranteed that any label that was once valid would stay valid
and vice versa.   The WG did not accept them.   The WG also
discussed whether Section G should be used (automatically or
not) to prevent any correction to Unicode that could change the
validity of labels would be prevented from being effective for
IDNA.  The WG decided against that too, preferring case-by-case
decisions at least until more experience accumulated.

My personal view is that the worst sort of instability of all is
to send a message to the community that IDNA2008 itself is
unstable because each new version of Unicode (or perhaps just
this one, but how can one predict) gives us an opportunity to
reopen questions about its fundamental design... and, as Patrik
has pointed out, reopen those questions, reach a fairly clear
consensus decision, and then reopen them again.  But it seems
that we are determined to do just that and send just that
message.

    john