Consensus Call Tranche 3 (Permanence) Summary

Tue Oct 21 07:00:19 CEST 2008

Consensus Call Tranche 3 (Permanence) Summary

YES - 12
NO -  6

there seems to be a strong preference either for the statements as  
they stand or for somewhat stronger formulations.

Should we start a thread to consider the language that Mark Davis  
offers as an alternative?

-------------------
(3) Permanence of DISALLOWED and PROTOCOL-VALID

(3.a) Once a character is classified as PROTOCOL-VALID, it
will remain in that category for all future versions of the
protocol and tables unless serious and unanticipated
circumstances occur.  (R.10)

(3.b) Once a character is classified as DISALLOWED, it will
remain in that category for all future versions of the protocol
and tables unless serious and unanticipated circumstances
occur.  Note that UNASSIGNED characters are not, for this
purpose, DISALLOWED.  (R.5)

COMMENTS:

I am of the opinion that the wording must be such that changes are to  
be made only in really really really rare events. So that "it will  
never happen", while still allowing IETF to do changes if really  
really really needed.
=====
I believe the permanence requirement should be stronger.  In other
words, I prefer to remove "unless serious and unanticipated
circumstances occur.".  With that change, the proposed text is fine with
me.

[comment by another WG member:
I don't care too much about the exact wording, because it seems
that we are all more or less in agreement with what we want:
Ideally, no changes, but nobody is able to completely predict
the future.]

Note that it will _always_ be possible to re-classify characters by
publishing a new IDNA standard that says "whatever IDNA2008 says, but
treat code point X as Y because Z".  This approach is more clear for
implementers, as it is simpler to explain conformance.

[comment by another WG member:
It's slightly simpler to say "conforms to IDNAxxxx" than to say
"conforms to IDNA2008 including erratum xyz" (or some such), but
I don't think the later is a serious problem.]

It also sends a strong message to the UTC that if they break the
normalization operation, they will break IDNA 2008.  PERIOD.  We should
not allow them to weasel out of a backwards incompatible normalization
change with reasons such as "well, this can be considered a
unanticipated circumstance, so you should simply change your
implementation because it is buggy" which they have done before.

[comment by another WG member:
Sending a strong message to any standards organization we depend
upon to be very careful with what they are doing is not a problem.

However, in the three cases I know about (two of them before IDNA2003,
one of them after), the UTC didn't break normalization at all,
on the contrary, it was fixing a bug that was found, after
very careful considerations of the consequences both of fixing
that bug and of not fixing that bug (and in the case of the
bug fix that (in theory) affected IDNA2003, after even looking at
existing implementations and assessing the difficulty for them to
fix the bug, in cases where the source code was available, including
the implementation written by Simon).

It is simply a fact that both implementations and specifications
are written by humans. It is a corollary that both implementations
and specifications occasionally contain bugs, and that when these
bugs are found, one has to carefully think about whether and how
to fix them. Treating specs as sacrosanct or written in stone
doesn't help at all.

[comment by another WG member:

Of course, but the normal approach is to revise the specification and
let applications and specifications upgrade to it to fix the problem.
My impression was that the UTC tried to fix existing implementations
that needed to depend on the unfixed specification.]

[comment by another WG member:
Okay, glad you agree with this [normal approach]

Thanks for telling us what went wrong in your opinion.
I don't think that impression is too wrong, I think many
people from the UTC tried to ask implementers
to just fix the implementations.]

In the case at hand, the bug in the normalization operation that
(in theory) affected IDNA2003 was carefully considered, and it
was concluded that:
a) The bug only affected certain character combinations that
    did not appear in practice, in particular not in domain names.

[comment by another WG member:
IDNA2003 supported profiles

[note by another WG member: that was NAMEPREP]

, and one of the profiles (SASLPrep) was used
to prepare passwords rather than domain names.  Passwords may contain
such strings, although unlikely.]

[comment by another WG member:
Yes, although highly, highly unlikely. The probability was
probably several factors (or even magnitudes) higher than
for domain names, but several magnitudes higher than 0 is
still 0.]

b) For the (nonexistent, see a)) data where the bug actually
    would have affected normalization, the outcome before the
    bug fix could lead to different or unexpected behavior for
    different implementations because the bug violated a
    (pretty obvious) idempotence assumption about normalization
    that the designers of IDNA2003 had implicitly made.

So in practice, the effect of the bug fix on IDNA2003 was
none, and just in case there ever was one, it would have
been positive.

[comment by another WG member:
Sure, but I'm talking about the process around integrating the fix, not
the fix itself.]

[comment by another WG member:
Okay. So your position would be: Fix (issue an erratum) for
IDNA 2003, then we'll fix the implementations. Very understandable.
The problem was that some of the people who would have been in
positions in the IETF to initiate such errata acted as if
the specs they depended on had no other choice than to be
cast in stone, and that the Unicode Consortium messed up
when they fixed an obvious bug in the spec.]

In conclusion, I think it's good for spec writers to know that
implementers expect specs to be stable, and to do everything
possible to keep it that way. However, it's also important
for implementers to understand that specs aren't infallible,
and in case of bug fixes (called "errata" in the case of specs)
to look at them with a clear and open mind, rather than to
continue to spread FUD.]

[comment by another WG member:
There were no errata for IDNA2003 in this area.]

[comment by another WG member:
So our conclusion here would be that there should have been?
I would definitely agree with that.]
============
The proposal makes promises it cannot guarantee. After all, many  
people expected that once a URL was valid according to IDNA2003, it  
would be valid forever, but we are with IDNA2008 breaking that for  
not one or two characters, but many thousands of characters. And for  
reasons that are not just "serious and unanticipated circumstances".  
So the wording needs to be fixed.

3a. The permanence of Valid URLs is exceedingly important. The major  
breakage between 2003 and 2008 is containable, but we don't want to  
have this every 5 years. So we do need to have a strong a language as  
we can regarding this. So I would say, that the intent of this should  
be not to change PVALID even in serious and unanticipated  
consequences. But we must also point out that even this could be  
overridden in the future by an obsoleting RFC (as above). So I can  
see wording somethat like the following.

Once a character is classified as PROTOCOL-VALID, it
will remain in that category for all future versions of the
protocol and tables unless this RFC is obsoleted. It is anticipated  
that such reclassification will never occur, given the backwards  
compatibility problems that would result.  (R.10)

3b. The permanence of Invalid URLs is a very different case. It means  
that we can't fix problems that arise, where for example, minority  
languages do need to use a character that ended up in the DISALLOWED  
bucket, because people didn't realize the need at the time the  
character was encoded.

Implementations MUST be set up to deal with formerly invalid URLs  
becoming valid all the time -- when the URL contains a code point  
that was unassigned in one version, and becomes assigned in the next.  
Making DISALLOWED be permanent does not really help implementations  
(even if people think it might), and is too strong to allow necessary  
changes. So in this case, the language needs to be clearer,

Once a character is classified as DISALLOWED, it will normally
remain in that category for all future versions of the protocol
and tables. However, if further evidence of the necessity of this  
character in some language is discovered, it may be moved to PVALID  
in Tables. Note that UNASSIGNED characters are not, for this
purpose, DISALLOWED.  (R.5)

==========
I'm agreeing to this, but I think its effect is less permanent than it
seems to imply: any new RFC could come along and obsolete these.  So
all it says is that, under this protocol and as long as this protocol
isn't obsoleted, the positive declarations (PVALID and DISALLOWED) are
permanent.
==========
The
statement should be strengthened for the guarantee
about PVALID.

The statement should be weakened a bit about the
DISALLOWED category. The very fact that we have been
having a long argument about whether LATIN SMALL LETTER
SHARP S should be moved from DISALLOWED to PVALID
should illustrate the problem. We can argue such
edge cases ahead of time, but there is simply no guarantee
that somebody isn't going to argue a year or two or five
from now for inclusion of one or more additional such characters --
without wanting to obsolete the entire protocol and
RFC.
======
As I mentioned ages ago, our <registry_hat=registrar_hat="on"> contracts
with ICANN are not unbounded. I appreciate the UTC's interest in
permanence, but asking for permanence in a DNS defined by contract is as
silly as [[ insert whatever appropriate simile here... ]]

[comment by another WG member:
I think I understand the UTC perspective. Maybe I don't. I do understand
my contracts, which are renewable, mention time. Not month-to-month
time, but time.]

I don't think anybody is either asking for or expecting
that either the universe of domain names will be stable
or that any given domain name will be guaranteed in perpetuity
to resolve to a given IP address -- or any other guarantee of
the like.

What I think we are looking for is permanent stability for
the *protocol* we are designing here. So that search engines
(or other processes) that can parse out a given label and
hand it off for domain name resolution one month aren't
faced the next month with the exact same string being
treated as uninterpretable nonsense if they are to conform
to the protocol.

It is fine if an IDN resolves this month and *doesn't* resolve
next month. It isn't fine if I can conformantly interpret
it this month and *cannot* conformantly interpret it
next month.

And the timelines involved are not to the end of time, of course,
but we are certainly talking about deployed software whose
lifetime can easily exceed 10 years and distributed data which
may still be around after several decades.

=======
On PVALID, the current wording or the ones suggested by Mark or Andrew
is fine with me too.

I agree  that the statement about DISALLOWED
characters could be weakened. The need for reclassifying a DISALLOWED
character may arise, and if it does we may well decide to reclassify
it as CONTEXTO rather than PVALID, making it a less detrimental
change.

=========
For (3.b), DISALLOWED to CONTEXTO may happen.  It should be well
described.
=========
I could live with current statement. However, like many others I  
think the DISALLOWED statement should be weakened because as stated  
it is not really enforceable. Stability of PROTOCOL-VALID is much  
more critical.

Something such as: " Once a character is classified as DISALLOWED, it  
SHOULD
remain in that category for all future versions of the protocol
and tables."
=========

NOTE NEW BUSINESS ADDRESS AND PHONE
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081021/6ee114e0/attachment-0001.htm