Comments on draft-ietf-idnabis-defs-10

Wed Aug 26 03:53:02 CEST 2009

At 11:24 PM -0400 8/24/09, Andrew Sullivan wrote:
>Dear colleagues,
>
>As part of an effort to respond to the current WGLC of multiple
>documents, I have read draft-ietf-idnabis-defs-10.  I am grateful to
>the Chair for extending the deadline, and apologetic to the editors
>that they've been made to wait.  I hope to be able to offer any useful
>comments I might have (assuming there are such) before the new
>deadline. 
>
>In a previous comment (see
>http://www.alvestrand.no/pipermail/idna-update/2009-July/004970.html),
>I made a vague remark about something I find worrisome in this text
>in §2.3.2.1:
>
>   o  An "A-label" is the ASCII-Compatible Encoding (ACE, see
>      Section 2.3.2.5) form of an IDNA-valid string.  It must be a
>      complete label: IDNA is defined for labels, not for parts of them
>      and not for complete domain names.  This means, by definition,
>      that every A-label will begin with the IDNA ACE prefix, "xn--"
>      (see Section 2.3.2.5), followed by a string that is a valid output
>      of the Punycode algorithm [RFC3492] and hence a maximum of 59
>      ASCII characters in length.  The prefix and string together must
>      conform to all requirements for a label that can be stored in the
>      DNS including conformance to the rules for the preferred form
>      described in RFC 1034, RFC 1035, and RFC 1123.  A string meeting
>      the above requirements is still not an A-label unless it can be
>      decoded into a U-label.
>
>So, to be less vague: the section is supposed to define certain terms,
>and that bullet ought to define "A-label".  It does not.  It tells us
>the necessary conditions for being an A-label, but not the sufficient.
>This could be remedied if the last sentence said instead, "If and only
>if a string meeting the above requirements can be decoded into a
>U-label, then it is an A-label." 

Sounds right to me.

>But I'm no longer sure that's true,
>given that we've lived with the I-D definition so long and yet not had
>it fully operationalized. 

I'm not so worried: your proposed change sounds what I would have expected.

>Is there anything else?  If there is, it
>needs to be added. 

Agree, but I think that your change is sufficient.

>These definitions, I say, must be completely
>operationalized (or else we have no excuse to call this document the
>definitions document).  Since people have to write code on the basis
>of these definitions, they must be completely unambiguous.
>
>In the message to which I referred above, I also objected to this:
>
>  o  A "U-label" is an IDNA-valid string of Unicode characters, in
>      normalization form NFC and including at least one non-ASCII
>      character, expressed in a standard Unicode Encoding Form (in an
>      Internet transmission context this will normally be UTF-8).
>
>The parenthetical remark, I think, encourages implementers not to
>recognise as U-labels strings that come in as (say) UTF-32, but that
>are otherwise perfectly valid.  Who cares what is normal in an
>Internet transmission context, when we're defining terms?  Why does
>that matter?

I think replacing the parenthetical with "(such as UTF-8)" fixes the problem.

>While I was contemplating this, I noticed another ambiguity in this
>section; apologies for not having caught it last round, although it's
>related to my first suggestion above:
>
>   To be valid, U-labels and A-labels must obey an important symmetry
>   constraint.  While that constraint may be tested in any of several
>   ways, an A-label must be capable of being produced by conversion from
>   a U-label and a U-label must be capable of being produced by
>   conversion from an A-label.  Among other things, this implies that
>   both U-labels and A-labels must be strings in Unicode NFC
>   [Unicode-UAX15] normalized form.  These strings MUST contain only
>   characters specified elsewhere in this document series, and only in
>   the contexts indicated as appropriate.
>
>This passage nowhere actually says that _the_ A-label produced by
>conversion from a particular U-label must in turn produce, by the
>application of the alogorithm, the _same_ U-label.  There is a
>symmetry (though not an obvious one) in U[1] being convertible to A[2]
>which is convertible to U[2] which is convertible to A[1], for
>instance.  I have no idea whether such is possible, but there's no
>reason our formal definitions need to allow for it.  This can be fixed
>so:
>
>   To be valid, U-labels and A-labels must obey an important symmetry
>   constraint.  While that constraint may be tested in any of several
>   ways, an A-label A' must be capable of being produced by conversion from
>   a U-label U', and that U-label U' must be capable of being produced by
>   conversion from A-label A'.  Among other things, this implies that
>   both U-labels and A-labels must be strings in Unicode NFC
>   [Unicode-UAX15] normalized form.  These strings MUST contain only
>   characters specified elsewhere in this document series, and only in
>   the contexts indicated as appropriate.

Sounds good. I doubt anyone in the WG meant to allow U1->A2->U2->A1 as making the label valid.

>I don't care about the notation, as long as it is unambiguously clear
>that we're always talking about the "very same" label on both sides of
>the transformation. 

I prefer
   ways, an A-label "A1" must be capable of being produced by conversion from
   a U-label "U1", and that U-label U1 must be capable of being produced by
   conversion from A-label A1. ...

>We could go so far as to define IDNA-equivalent
>A-labels and U-labels formally.  I think this would do it:
>
>    A-label1 and U-label1 are equivalent if and only if all the
>    following four conditions are true:
>
>        1.  The encoding of A-label1 according to [RFC3492] results in
>        U-label1.
>
>        2.  The decoding of U-label2 according to [RFC3492] results in
>        A-label2.
>
>        3.  A-label1 is equivalent to A-label2 according to DNS
>        matching rules for labels.
>
>        4.  U-label1 is bistring equivalent to U-label2.

The first part is fine (other than needing to say "remove the 'xn--' before decoding), but the bringing in U-label2 and A-label2 is quite confusing. It is not clear where they came from. I think that this might be trying too hard,

>Some may reject the above as a bit of needless formalism, or want to
>reduce some steps.  I argue that this is the most basic and therefore
>most clear (but admittedly inelegant) formulation.  As usual, however,
>I'm utterly prepared to admit that I've actually got the rule
>incorrect.  But if I have, that amounts to a hint of trouble with the
>document, because I've managed to misunderstand it (and though I be
>dim, I have been following this effort).

I'm in the camp of "needless formalism".

>Finally, while I get the point of, "As is usual with IETF
>specifications, while the document represents rough consensus, it
>should not be assumed that all participants and contributors agree
>with all provisions," I don't feel comfortable with starting to make
>the Acknowledgements section a platform for disclaimers about WG
>consensus.  I object pretty strongly to this addition. 

Thank you for mentioning this. It irked me too, and I was glad to *not* see it in the other WG documents. It can be removed without reducing the understanding of readers.