WGLC: draft-faltstrom-5892bis-04.txt

Wed Apr 27 20:58:08 CEST 2011

--On Wednesday, April 27, 2011 10:06 -0400 Andrew Sullivan
<ajs at shinkuro.com> wrote:

>...
> Implicit in the decision in the past about these sorts of
> cases was that we'd treat them case by case.  The reason to do
> that was that some (potential) incompatibilities are more
> serious than others.

We have been over this before, but there is another reason that
keeps getting lost when someone says "it is incompatible".
There really isn't ever going to be a choice between
"incompatibility" and "no incompatibility".   It is a tradeoff
between two different choices of incompatibility:

	* We preserve the earlier interpretation of a character.
	That requires compensating for a change in Unicode
	properties by putting the character into a table of
	exceptions (a table that we have not yet actually had to
	create and start putting entries into).

	* We preserve the alignment between IDNA2008 and
	evolving-version Unicode properties by letting the
	character's interpretation change if its Unicode
	properties are changed.  That is the "do nothing" option
	as far as IDNA2008 (and RFC 5892 in particular) are
	concerned -- IDNA2008's rules and tables are unchanged,
	but the character status changes.

That distinction is important because the underlying design
model for IDNA2008 is that it is a specification of rules about
property values (with exceptions as absolutely needed), not a
table of characters and their interpretations (the latter was
the model with Stringprep and Nameprep).  While I assume that
others may understand things differently than I do, that implies
to me that the "normal" (and stable) action for new versions of
Unicode is simply to run the existing rules against the
Unicode-updated character and property list.  If Unicode decides
that a property was sufficiently in error that they should
reclassify the character despite whatever disruptions that would
have (to our work or that of anyone else), then the IETF/IDNA
default, IMO, should be that we don't second-guess that choice
or its importance, _especially_ in an environment in which,
however many labels appear in the DNS today for a previously
PVALID character (or that might have appeared if the character
were previously DISALLOWED), there will be more on the future,
just because of the growth in registrations.

Remember too that the WG's decision (which I hope we don't need
to reopen) was that, all things being equal, a change from
DISALLOWED to PVALID is no less problematic than a change from
PVALID to DISALLOWED.  That is at least partially because, if
someone wants to register a label that would naturally include a
DISALLOWED character, people will make compromises and register
whatever they consider as similar as possible.    If the
character is later reclassified to PVALID, while there is no
issue with a label becoming invalid, we end up with the same
problem we are now facing with Sharp S, Final Sigma, and the
previously "mapped to nothing" joiner characters (and had some
years ago with the introduction of decorated Latin characters):
there is no way to know whether previous registrations were
intended as compromises with what was possible or were intended
to be in that form, creating a need for either special
registration models (e.g., sunrise reservations), alias
registrations in some form, or both.

The reality is that any change in properties and therefore IDNA
classification of a code point between versions of Unicode is
going to be disruptive.  I think we need to assume (and hope)
that Unicode will not make such changes unless they are really
necessary and justified and will exert appropriate caution in
adding characters to make the odds of needing to reclassify them
very low. But, if they do make such a change and we continue to
believe in the "Unicode version independent" model of IDNA2008,
our default, IMO, needs to be "follow Unicode" in the absence of
strong and material evidence that the change would be unusually
and unacceptably disruptive to IDNA.  Otherwise, it seems to me
that we lose that model and, instead, replace the IDNA2003 model
of "Unicode 3.2 forever" with one that uses rolling snapshots of
Unicode based on each version's property lists, i.e., that we
effectively hold properties immutable that the Unicode
Consortium, in its wisdom, feels free to change.

IF we were changing the rules around
> (say) the character "0", I'm quite sure that the reaction
> would be different.

Indeed.

    john