IANA actions and tables document

Thu Dec 11 02:52:36 CET 2008

John wrote in response:

> >> 1) IANA is to keep registries of the following sections of
> >> the tables   document:
> >... 
> >> 1.2) 2.7.  BackwardCompatible (G)
> >...
> >> 2) Changes
> >> 
> >> 2.1) Changes to 1.1, 1.2 and 1.3 above require IETF action
> > 
> > I disagree. Changes to 1.1 and 1.3 should require IETF action,
> > but as Mark tried to explain, if you don't make 1.2 part
> > of the automatic process, all that happens is that you end
> > up potentially introducing *instability* into IDN's between
> > Unicode versions. That is the opposite of what the intent of
> > Section 2.7 is for.
> 
> Ken,
> 
> I'm confused.

As is, seemingly, nearly everybody participating in this thread. :-( 

> As I've understood it, category G (1.2 above) was
> included to deal specifically with the (we hope extremely rare)
> case in which changes were made to Unicode (and/or its various
> annexes and reports) that required some special action to
> preserve IDNA compatibility.

No, that is too broadly construed, which is part of the
problem here.

2.6 Exceptions (F) is the category for, well, *exceptions*.
That is the category that essentially allows the IETF to
discuss and decide, for *any* Unicode character, current
or future, if there is some compelling case where IDN behavior
needs to be defined as differing from what would otherwise
be derived on the basis of applying Unicode character
properties by Section 3.

If the Unicode Standard (and/or its various annexes and reports)
change in some way that requires some *special* action to
preserve IDNA compatibility -- or for that matter some *special*
action to meet a new constituency requirement for IDN's that
isn't covered by general character properties -- then
explicit (and debated and reviewed) modification of the
Exceptions table is how you get that to happen.

2.7 BackwardCompatibility (G) is quite different. It is there
to be the *automatic* guarantee that no change in Unicode
character properties between versions of Unicode will result
in kicking an IDNA PVALID character into DISALLOWED status,
which would be a very, very bad thing. (See Mark's analysis
of the resulting interoperability chaos.)

Adding something to the BackwardCompatibility list is *not*
something that needs explicit debate and review. In order
to preserve stability for IDNs, *if* it ever happens
(and we do hope it is an extremely rare case) that for a
new version of Unicode, the derivation in Section 3 ends
up turning a formerly PVALID character to DISALLOWED, then
you just automatically add it to the BackwardCompatibility
list to *force* it to stay PVALID, thereby keeping all your
implementations backward compatible, even during any
transition period when some might be upgrading and others not.

>   One can imagine changes that
> would upset IDNA compatibility but not more general Unicode
> compatibility (for example, I think we are dependent on some
> properties that are not guaranteed to be stable).

That is basically irrelevant to the distinction here.

> That inherently requires a judgment call on someone's part --
> the IETF, UTC, or someone else-- which means it cannot be
> automatic.

No it does not. This is a completely automatic decision,
and most assuredly should *NOT* require a judgement call:

Unicode 5.2: derivation shows code point X is PVALID for IDN's.

Unicode 5.3: derivation shows code point X is DISALLOWED for IDN's.

   RED FLAG ALERT: THIS IS BAD FOR IDNA!!

   (Automatic) Response: Add code point X to BackwardCompatible table.

   Result: X --> PVALID, as it should be, for backwards compatiblity.

> 
> What am I missing?

Well, what I just tried to explain.

Seriously, Mark and I don't have some hidden agenda here. We
are simply trying to help y'all put in place the mechanism that
lets you guarantee the kind of IDN stability across Unicode
versions that you are, per the rationale and protocol documents,
trying to have. And without having to revise and update the
RFC's for every Unicode version, which is also your stated goal.

This is a tried and true mechanism, by the way. We have been
using it for years, internal to the Unicode Standard, to
guarantee stability of the *identifier* definitions based on Unicode
character properties.

The only difference here is that the specification of the
particular kind of identifier in question here, IDN labels,
is *external* to the Unicode Consortium, and Mark and Michel and I
seem to be having difficulty explaining to everybody involved
how stability guarantees for identifiers between versions
of a dynamically updated character encoding (i.e. Unicode)
work.

Please do not get stuck in the conceptual trap of assuming
that because something gets handled exceptionally in a derivation,
that it is thereby necessarily something that requires a
judgement call and (given IETF process) requires throwing the
whole specification (or some significant part of it) back
into the ring for extended debate -- the way the discussion
of sharp-s, final sigma, Hangul jamos and other such
"exceptions" have.

People seem to be assuming that if something is a "special
case", then of course it has to engage IETF debate and
decision. But not all special cases are alike. I think
people are extrapolating from sharp-s and final sigma and
Turkish i into the unknown, but the result of the extrapolation,
the way you are heading currently, will be to guarantee
*in*stability in the specification, rather than stability.

And, by the way, what Mark and I are proposing as the fix
for Section 2.7 BackwardCompatible, in no way precludes
IETF review and decision for any character change. A decision
to modify 2.6 Exceptions in some way, either to make
an otherwise PVALID character DISALLOWED, or CONTEXTO,
or to make an otherwise DISALLOWED character PVALID or
CONTEXTO, would always trump anything automatic being
carried forward in the BackwardCompatible table. So there
is the escape valve for your judgement decisions.

In one more attempt to make this clear with an example,
let me try again.

Unicode 5.1 has some character that might be construed as
marginal in its General_Category assignment. Let's
pick:

U+2E2F VERTICAL TILDE

Currently that is General_Category=Lm (Modifier letter), which
means that by Unicode rules it is o.k. for identifiers,
and by the IDNA derivation, it is also PVALID for IDN's.
It also means that default word selection behavior would
allow it to be included inside word boundaries.

Now as far as we know, that character is only used in Old
Church Slavonic manuscripts (and maybe some other old
Cyrillic materials). What if the OCS scholarly community
came to the UTC in the future and were to insist
forcefully that, "No, no, no, this really is just a symbol
in our usage, and shouldn't be part of word selection."
There are various possible responses to that -- one of
which (although not inevitably, by any means) would be for
the UTC to recategorize the character as General_Category=Sk
(Modifier symbol).

*If* it did that, the UTC would also require stability fixes,
because changing Lm->Sk is the kind of change that impacts
the derivation for identifiers. So it would also end
up Other_ID_Start=True, to guarantee identifier stability,
although the chances of anybody actually having used
VERTICAL TILDE in an identifier are really small.

So say this change rolls out in Unicode 6.X (or whatever).
Its impact on IDNA would be to have the Unicode 5.2
value of U+2E2F as PVALID suddenly show up as DISALLOWED,
instead, by the derivation in Section 3 of tables.txt.

What should be your response? Well, the IETF, per se, shouldn't
have to debate this and make some complicated decision.
You just automatically add U+2E2F to the IANA
IDNA_Backwards_Compatibility_Table as of Unicode 6.X,
and everything continues hunky-dory for IDN's -- just
as if the UTC hadn't decided to change anything at all.
Precisely the desired outcome, I think.

Now, *if* in the very unlikely situation that a change
like that occurs (and at this point, that is precisely
the *kind* of character for which it might even be possible
that the UTC would consider such a change), and *if*
in the even more unlikely situation that somebody in
the IETF community or among the various NICs and others
concerned with registration of domain names decides
that the *reason* for this change for U+2E2F VERTICAL TILDE
by the UTC might render the character suspect in domain
names, so that the IETF ought to disallow it formally
by protocol (or require a context rule for it), then
the IETF could decide to modify the 2.6 Exceptions
list in tables.txt (or the IANA table registered for it),
and do whatever it likes in terms of disposition
for U+2E2F VERTICAL TILDE in the future. That is up
to the IETF and out of the hands of the UTC -- as it
should be for anything the IETF decides needs *special*
attention on its own terms.

So there is your realistic scenario -- as unlikely as
it will be in practice.

Please do *not* confuse this with utterly different
(and far more likely to occur and to be controversial)
special cases. For example, suppose the German government
decides to change the German orthography and specifically
and by law endorses an uppercase German sharp-s. Then
*everybody* is going to have to adapt their software
and its case-handling, and that would extend to special
case mapping for IDNA. That's the kind of situation that
you do *not* want handled by some automatic process
in 2.7 BackwardCompatible (G).

--Ken