Comments on draft-ietf-idnabis-defs-10

Andrew Sullivan ajs at shinkuro.com
Tue Aug 25 05:24:40 CEST 2009


Dear colleagues,

As part of an effort to respond to the current WGLC of multiple
documents, I have read draft-ietf-idnabis-defs-10.  I am grateful to
the Chair for extending the deadline, and apologetic to the editors
that they've been made to wait.  I hope to be able to offer any useful
comments I might have (assuming there are such) before the new
deadline.  

In a previous comment (see
http://www.alvestrand.no/pipermail/idna-update/2009-July/004970.html),
I made a vague remark about something I find worrisome in this text
in §2.3.2.1:

   o  An "A-label" is the ASCII-Compatible Encoding (ACE, see
      Section 2.3.2.5) form of an IDNA-valid string.  It must be a
      complete label: IDNA is defined for labels, not for parts of them
      and not for complete domain names.  This means, by definition,
      that every A-label will begin with the IDNA ACE prefix, "xn--"
      (see Section 2.3.2.5), followed by a string that is a valid output
      of the Punycode algorithm [RFC3492] and hence a maximum of 59
      ASCII characters in length.  The prefix and string together must
      conform to all requirements for a label that can be stored in the
      DNS including conformance to the rules for the preferred form
      described in RFC 1034, RFC 1035, and RFC 1123.  A string meeting
      the above requirements is still not an A-label unless it can be
      decoded into a U-label.

So, to be less vague: the section is supposed to define certain terms,
and that bullet ought to define "A-label".  It does not.  It tells us
the necessary conditions for being an A-label, but not the sufficient.
This could be remedied if the last sentence said instead, "If and only
if a string meeting the above requirements can be decoded into a
U-label, then it is an A-label."  But I'm no longer sure that's true,
given that we've lived with the I-D definition so long and yet not had
it fully operationalized.  Is there anything else?  If there is, it
needs to be added.  These definitions, I say, must be completely
operationalized (or else we have no excuse to call this document the
definitions document).  Since people have to write code on the basis
of these definitions, they must be completely unambiguous.

In the message to which I referred above, I also objected to this:

  o  A "U-label" is an IDNA-valid string of Unicode characters, in
      normalization form NFC and including at least one non-ASCII
      character, expressed in a standard Unicode Encoding Form (in an
      Internet transmission context this will normally be UTF-8). 

The parenthetical remark, I think, encourages implementers not to
recognise as U-labels strings that come in as (say) UTF-32, but that
are otherwise perfectly valid.  Who cares what is normal in an
Internet transmission context, when we're defining terms?  Why does
that matter?

While I was contemplating this, I noticed another ambiguity in this
section; apologies for not having caught it last round, although it's
related to my first suggestion above:

   To be valid, U-labels and A-labels must obey an important symmetry
   constraint.  While that constraint may be tested in any of several
   ways, an A-label must be capable of being produced by conversion from
   a U-label and a U-label must be capable of being produced by
   conversion from an A-label.  Among other things, this implies that
   both U-labels and A-labels must be strings in Unicode NFC
   [Unicode-UAX15] normalized form.  These strings MUST contain only
   characters specified elsewhere in this document series, and only in
   the contexts indicated as appropriate.

This passage nowhere actually says that _the_ A-label produced by
conversion from a particular U-label must in turn produce, by the
application of the alogorithm, the _same_ U-label.  There is a
symmetry (though not an obvious one) in U[1] being convertible to A[2]
which is convertible to U[2] which is convertible to A[1], for
instance.  I have no idea whether such is possible, but there's no
reason our formal definitions need to allow for it.  This can be fixed
so:

   To be valid, U-labels and A-labels must obey an important symmetry
   constraint.  While that constraint may be tested in any of several
   ways, an A-label A' must be capable of being produced by conversion from
   a U-label U', and that U-label U' must be capable of being produced by
   conversion from A-label A'.  Among other things, this implies that
   both U-labels and A-labels must be strings in Unicode NFC
   [Unicode-UAX15] normalized form.  These strings MUST contain only
   characters specified elsewhere in this document series, and only in
   the contexts indicated as appropriate.

I don't care about the notation, as long as it is unambiguously clear
that we're always talking about the "very same" label on both sides of
the transformation.  We could go so far as to define IDNA-equivalent
A-labels and U-labels formally.  I think this would do it: 

    A-label1 and U-label1 are equivalent if and only if all the
    following four conditions are true:

        1.  The encoding of A-label1 according to [RFC3492] results in
        U-label1.

        2.  The decoding of U-label2 according to [RFC3492] results in
        A-label2.

        3.  A-label1 is equivalent to A-label2 according to DNS
        matching rules for labels.

        4.  U-label1 is bistring equivalent to U-label2.

Some may reject the above as a bit of needless formalism, or want to
reduce some steps.  I argue that this is the most basic and therefore
most clear (but admittedly inelegant) formulation.  As usual, however,
I'm utterly prepared to admit that I've actually got the rule
incorrect.  But if I have, that amounts to a hint of trouble with the
document, because I've managed to misunderstand it (and though I be
dim, I have been following this effort).

Finally, while I get the point of, "As is usual with IETF
specifications, while the document represents rough consensus, it
should not be assumed that all participants and contributors agree
with all provisions," I don't feel comfortable with starting to make
the Acknowledgements section a platform for disclaimers about WG
consensus.  I object pretty strongly to this addition.  I don't think
we're served well by trying to state in any document how rough the
rough consensus is: the document either has to stand through the IETF
process, or not.  Besides, this evaluation is a prerogative of the
Chair, the ADs, and the IESG.  If this sort of disclaimer is needed,
it ought to be added by the IESG (and even then I would object). I
would like the sentence to be removed.

Apart from the above, I think this document is in good shape on its
own.  Because I've been late in getting to the documents under LC, I
am not yet in a position to respond to this document in the context of
all the other documents.  Therefore, I will not say yet
unconditionally that I support the document advancing.  With that
reservation, I think the document in isolation is ready to go, and I
have not yet thought of an issue across documents that might make me
rethink my view.

Best regards,

Andrew

-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list