Tables: BackwardCompatible Maintanence

Wed Dec 10 08:15:06 CET 2008

You're saying you don't want "automatic incorporation of changes initiated
by the Unicode community". But what Ken is proposing is precisely the
reverse: it is *preventing* automatic incorporation of changes in
pre-existing characters initiated by the Unicode community.

Having automatic generation of the BackwardCompatible table would ensure
that changes to pre-existing characters from Unicode would *not*
automatically be picked up by IDNA. It insulates IDNA from any Unicode
changes to previously assigned characters. If we *don't* put into place
something like what Ken proposes, we *do* get "automatic incorporation of
changes initiated by the Unicode community".

This may seem paradoxical, but let's look at how the situation plays out
with a concrete example.

Suppose that X is a letter and Y is a symbol in Unicode 5.1. The Unicode
consortium receives information that X is actually a symbol and Y is a
letter, and makes that change in Unicode 5.2 next fall. And let's say half
of the implementations of IDNA (both lookup clients and registries) update
to Unicode 5.2 when released and half don't. (In reality implementations
will update in a much more haphazard way, in accordance with whatever their
ship schedule is. Google's search engine like Google typically updates
almost immediately, while a browser sitting on someone's machine might not
be updated for months, or years. But for the sake of illustration, let's say
that this is half and half.)

Scenario A. BackwardCompatible *is not* automatically generated by IANA.

   1. The updated implementations dutifully run the algorithm in Tables, and
   now consider X to be DISALLOWED, and  Y to be PVALID.
   2. The unupdated implementations still consider X to be PVALID and Y to
   be DISALLOWED
   3. So updated registries now disallow their registrations with X
   (refunding people's money?), and allow registrations with Y. And updated
   lookup clients will refuse to lookup X, and will lookup Y.
   4. Unupdated registries still allow registrations with X, and disallow
   registrations with Y. And unupdated lookup clients will allow lookups with
   X, and will refuse to lookup Y.
   5. There will be obvious conflicts among mismatched implementations and
   registries; and these conflicts are not the result of new characters; they
   are among characters that all of the implementations are meant to handle.
   6. Then the IETF decides this is a bad situation, starts the whole
   process of doing a new RFC. Some time later, this adds X and Y to
   BackwardCompatible to restore the previous situation.
   7. Some subset of the updated implemenations now pull in the new
   BackwardCompatible list, and now they revert to the 5.1 behavior for these.
   Others don't update to that table immediately, and still conflict with
   unupdated implemenations until they are updated.
   8. This is a bit of a mess, and an avoidable one.

Scenario B. BackwardCompatible *is* automatically generated by IANA. (Ken's
proposal).

   1. The implemenations that update to Unicode 5.2 dutifully run the
   algorithm in Tables, with the BackwardCompatible table generated for Unicode
   5.2.
   2. Because IANA has updated BackwardCompatible, they won't change X and Y
   from what they were in Unicode 5.1.
   3. No compatibility problems for X and Y.

Scenario B does *not* preclude the IETF from considering the situation with
X and Y, and deciding to go with all or part of the Unicode 5.2 change
(although I'd recommend against it!). They could then issue a new RFC that
changes the ExceptionTable so as to add X and Y (or just one, if desired).
Such a step would still cause some problems with incompatible
implementations, but the release of the RFC would be at the IETFs pace, the
compatibility problems would be known (and worth the price according to the
IETF, or it wouldn't do it), and we wouldn't get the back and forth
instabilities where a character goes from PVALID to DISALLOWED and back, or
the reverse.

Having these kinds of procedures are nothing new for the IETF. BCP47, for
example, has had this kind of process in place. That is, in the
specification it has precise rules for how to update tables based on what
its source standards (ISO in this case) do, so as to maintain stability.

Now, I do think that the likelyhood of the X/Y case happening is quite
small, and essentially zero with any frequently used characters. Clearly "A"
is not going to change to a Symbol! But the likelyhood is not zero. For the
small number of exceptions that we've had to a similar
backward-compatibility table for Unicode identifiers (over the span of many
versions), see http://www.unicode.org/reports/tr31/#Backward_Compatibility.

But as long as we are devising this process, the changes that Ken proposes
in order to make it completely bullet-proof are small, and give us the
security that no matter what happens in future Unicode versions, stability
of IDNs is guaranteed even without further action by the IETF.

Mark

On Tue, Dec 9, 2008 at 18:25, Erik van der Poel <erikv at google.com> wrote:

> Martin and Ken,
>
> I believe the point is that *if* a relevant Unicode property changes,
> no matter how unlikely, then the IETF and other DNS stakeholders ought
> to get a chance to consider whether or not to make the consequent
> incompatible change to IDNA. After all, if the Unicode community
> decided that the change was important enough to make, then the DNS
> community may also decide that it is important enough to change on the
> IDNA side.
>
> So, we do not want automatic incorporation of changes initiated by the
> Unicode community. Instead, we want the DNS community to make the
> right decision for DNS, which is something the Unicode community is
> not qualified to do.
>
> Just look at what happens when IDNA blindly adopts Unicode specs: The
> eszett/ss debacle. Now we have to make a painful change for that,
> because the German registry insists that eszett be included. Of
> course, the German registry should have voiced their opinion when
> IDNA2003 was being drafted, but...
>
> So, we need a group of experts to first try the automatic derivation,
> and then see whether there are any PVALID->DISALLOWED or
> DISALLOWED->PVALID changes. If there are, then the issue must be
> discussed, probably via Internet Drafts, possibly without restarting
> the WG. Then each character must be put into BackwardCompatible or
> Exceptions, depending on the decision. (Or it might be simpler to
> always put such characters into Exceptions, and just not have a
> BackwardCompatible category.)
>
> Erik
>
> On Tue, Dec 9, 2008 at 5:52 PM, Martin Duerst <duerst at it.aoyama.ac.jp>
> wrote:
> > But isn't this exactly what we do NOT want? If I understand
> > correctly, BackwardCompatible is a purely administrative thing,
> > ideally we give IANA an algorithm, and they execute it and
> > put the result into the registry when there is an update
> > to Unicode properties that requires a balancing entry in
> > BackwardsCompatibile. Bothering the IETF at large with this
> > is pretty useless; having an expert reviewer as a "goto guy"
> > for IANA will probably help.
> >
> > On the other hand, for Context, changes are usuall something
> > new and unexpected, and potentially even political. A somewhat
> > more serious/heavy process seems appropriate.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081209/b2a92d28/attachment-0001.htm