Comments on draft-ietf-idnabis-defs-10

Wed Sep 2 17:39:50 CEST 2009

On Wed, Sep 2, 2009 at 6:50 AM, John C Klensin <klensin at jck.com> wrote:

>
>
> --On Tuesday, September 01, 2009 16:01 -0400 Vint Cerf
> <vint at google.com> wrote:
>
> > let's go with the "make sure the XN-label string is in
> > lowercase before converting to U-label"
>
> Ok.
>
> I've patched the changes into Protocol, making modifications
> only where conversion from A-labels to U-labels is discussed.
> I'm going to post that version for WG review as soon as I can
> get it compiled and submitted.  A version with the remaining
> pre-IETF-last call patches (mainly the filling in of Section
> numbers) will follow shortly (probably tomorrow) if either there
> are no further comments or the comments indicate that this is
> ok.   I've made no changes to Definitions -- the discussion
> doesn't seem to require them.
>
>
John,

Thank you for the quick turnaround on this and providing valuable insights,
as always.

I've reviewed protocol-15 with respect to this issue and it looks good.

As for the definitions document, I think you're right that there may not be
any changes necessary. In an earlier message, you said that some points
in Andrew's comments at the beginning of the thread have been incorporated.
I'll wait for defs-11 and review it once more.

Related note: in light of the discovery made by James that Punycode can in
fact output uppercase characters to represent encoded non-ASCII codepoints,
we could go back to idnabis-defs-10, 2.3.2.1 and further qualify "output of
the Punycode algorithm". However, since practically all implementations
output to lowercase, I suppose it is not necessary?

Probably Rationale should be extended to discuss this issue and
> the reasons for the "require lowercase" statement.  I'd welcome
> text on that subject and advice as to where to put it, but will
> make something up if I don't hear from people.
>
>
I'm lousy at writing such texts but do the follow bullets capture what you
intend to say?

1. Symmetry constraint between U-label and A-label is a desirable property
and key design goal of IDNA2008
2. A-labels, being a subset of LDH-labels are sometimes stored and used
without preserving case.
3. When that happens, we end up with having uppercase characters in the
Punycode decoded result, which makes it an invalid U-label.
4. This happens because of the Punycode algorithm preserving the cases of
the "basic code points" in the decoding process.
5. Because the Punycode encoding process (practically) never outputs
uppercase characters from valid U-labels, we know that a valid A-label must
not contain any uppercase character after the "xn--" ACE prefix.

=wil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090903/350ff03e/attachment.htm