Comments on draft-ietf-idnabis-defs-10
Paul Hoffman
phoffman at imc.org
Wed Aug 26 03:53:02 CEST 2009
At 11:24 PM -0400 8/24/09, Andrew Sullivan wrote:
>Dear colleagues,
>
>As part of an effort to respond to the current WGLC of multiple
>documents, I have read draft-ietf-idnabis-defs-10. I am grateful to
>the Chair for extending the deadline, and apologetic to the editors
>that they've been made to wait. I hope to be able to offer any useful
>comments I might have (assuming there are such) before the new
>deadline.
>
>In a previous comment (see
>http://www.alvestrand.no/pipermail/idna-update/2009-July/004970.html),
>I made a vague remark about something I find worrisome in this text
>in §2.3.2.1:
>
> o An "A-label" is the ASCII-Compatible Encoding (ACE, see
> Section 2.3.2.5) form of an IDNA-valid string. It must be a
> complete label: IDNA is defined for labels, not for parts of them
> and not for complete domain names. This means, by definition,
> that every A-label will begin with the IDNA ACE prefix, "xn--"
> (see Section 2.3.2.5), followed by a string that is a valid output
> of the Punycode algorithm [RFC3492] and hence a maximum of 59
> ASCII characters in length. The prefix and string together must
> conform to all requirements for a label that can be stored in the
> DNS including conformance to the rules for the preferred form
> described in RFC 1034, RFC 1035, and RFC 1123. A string meeting
> the above requirements is still not an A-label unless it can be
> decoded into a U-label.
>
>So, to be less vague: the section is supposed to define certain terms,
>and that bullet ought to define "A-label". It does not. It tells us
>the necessary conditions for being an A-label, but not the sufficient.
>This could be remedied if the last sentence said instead, "If and only
>if a string meeting the above requirements can be decoded into a
>U-label, then it is an A-label."
Sounds right to me.
>But I'm no longer sure that's true,
>given that we've lived with the I-D definition so long and yet not had
>it fully operationalized.
I'm not so worried: your proposed change sounds what I would have expected.
>Is there anything else? If there is, it
>needs to be added.
Agree, but I think that your change is sufficient.
>These definitions, I say, must be completely
>operationalized (or else we have no excuse to call this document the
>definitions document). Since people have to write code on the basis
>of these definitions, they must be completely unambiguous.
>
>In the message to which I referred above, I also objected to this:
>
> o A "U-label" is an IDNA-valid string of Unicode characters, in
> normalization form NFC and including at least one non-ASCII
> character, expressed in a standard Unicode Encoding Form (in an
> Internet transmission context this will normally be UTF-8).
>
>The parenthetical remark, I think, encourages implementers not to
>recognise as U-labels strings that come in as (say) UTF-32, but that
>are otherwise perfectly valid. Who cares what is normal in an
>Internet transmission context, when we're defining terms? Why does
>that matter?
I think replacing the parenthetical with "(such as UTF-8)" fixes the problem.
>While I was contemplating this, I noticed another ambiguity in this
>section; apologies for not having caught it last round, although it's
>related to my first suggestion above:
>
> To be valid, U-labels and A-labels must obey an important symmetry
> constraint. While that constraint may be tested in any of several
> ways, an A-label must be capable of being produced by conversion from
> a U-label and a U-label must be capable of being produced by
> conversion from an A-label. Among other things, this implies that
> both U-labels and A-labels must be strings in Unicode NFC
> [Unicode-UAX15] normalized form. These strings MUST contain only
> characters specified elsewhere in this document series, and only in
> the contexts indicated as appropriate.
>
>This passage nowhere actually says that _the_ A-label produced by
>conversion from a particular U-label must in turn produce, by the
>application of the alogorithm, the _same_ U-label. There is a
>symmetry (though not an obvious one) in U[1] being convertible to A[2]
>which is convertible to U[2] which is convertible to A[1], for
>instance. I have no idea whether such is possible, but there's no
>reason our formal definitions need to allow for it. This can be fixed
>so:
>
> To be valid, U-labels and A-labels must obey an important symmetry
> constraint. While that constraint may be tested in any of several
> ways, an A-label A' must be capable of being produced by conversion from
> a U-label U', and that U-label U' must be capable of being produced by
> conversion from A-label A'. Among other things, this implies that
> both U-labels and A-labels must be strings in Unicode NFC
> [Unicode-UAX15] normalized form. These strings MUST contain only
> characters specified elsewhere in this document series, and only in
> the contexts indicated as appropriate.
Sounds good. I doubt anyone in the WG meant to allow U1->A2->U2->A1 as making the label valid.
>I don't care about the notation, as long as it is unambiguously clear
>that we're always talking about the "very same" label on both sides of
>the transformation.
I prefer
ways, an A-label "A1" must be capable of being produced by conversion from
a U-label "U1", and that U-label U1 must be capable of being produced by
conversion from A-label A1. ...
>We could go so far as to define IDNA-equivalent
>A-labels and U-labels formally. I think this would do it:
>
> A-label1 and U-label1 are equivalent if and only if all the
> following four conditions are true:
>
> 1. The encoding of A-label1 according to [RFC3492] results in
> U-label1.
>
> 2. The decoding of U-label2 according to [RFC3492] results in
> A-label2.
>
> 3. A-label1 is equivalent to A-label2 according to DNS
> matching rules for labels.
>
> 4. U-label1 is bistring equivalent to U-label2.
The first part is fine (other than needing to say "remove the 'xn--' before decoding), but the bringing in U-label2 and A-label2 is quite confusing. It is not clear where they came from. I think that this might be trying too hard,
>Some may reject the above as a bit of needless formalism, or want to
>reduce some steps. I argue that this is the most basic and therefore
>most clear (but admittedly inelegant) formulation. As usual, however,
>I'm utterly prepared to admit that I've actually got the rule
>incorrect. But if I have, that amounts to a hint of trouble with the
>document, because I've managed to misunderstand it (and though I be
>dim, I have been following this effort).
I'm in the camp of "needless formalism".
>Finally, while I get the point of, "As is usual with IETF
>specifications, while the document represents rough consensus, it
>should not be assumed that all participants and contributors agree
>with all provisions," I don't feel comfortable with starting to make
>the Acknowledgements section a platform for disclaimers about WG
>consensus. I object pretty strongly to this addition.
Thank you for mentioning this. It irked me too, and I was glad to *not* see it in the other WG documents. It can be removed without reducing the understanding of readers.
More information about the Idna-update
mailing list