Unconditional punycode conversion

Simon Josefsson simon at josefsson.org
Wed Mar 9 19:08:45 CET 2011


Andrew Sullivan <ajs at shinkuro.com> writes:

> On Wed, Mar 09, 2011 at 06:13:57PM +0100, Simon Josefsson wrote:
>> Andrew Sullivan <ajs at shinkuro.com> writes:
>> 
>> > On Wed, Mar 09, 2011 at 05:36:10PM +0100, Simon Josefsson wrote:
>> >> To verify my understanding: the label "ab--cd" is permitted by IDNA2008
>> >> despite it having "--" in the third and fourth characater positions?
>> >> That would be because section 5.4 only applies to non-ascii labels.
>> >
>> > No.  See section 2.3.1 of RFC 5890.
>> 
>> I don't see any MUST/SHOULD language there.  RFC 5891 says:
>> 
>>    Putative U-labels with any of the
>>    following characteristics MUST be rejected prior to DNS lookup:
>> ...
>>    o  Labels containing "--" (two consecutive hyphens) in the third and
>>       fourth character positions.
>> 
>> Is "ab--cd" a putative U-label?
>
> Certainly not.  It has no high bits.  But,
>
>    To facilitate clear description, two new subsets of LDH labels are
>    created by the introduction of IDNA.  These are called Reserved LDH
>    labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
>    Reserved LDH labels, known as "tagged domain names" in some other
>    contexts, have the property that they contain "--" in the third and
>    fourth characters but which otherwise conform to LDH label rules.
>    Only a subset of the R-LDH labels can be used in IDNA-aware
>    applications.  That subset consists of the class of labels that begin
>    with the prefix "xn--" (case independent), but otherwise conform to
>    the rules for LDH labels.  That subset is called "XN-labels" in this
>    set of documents.  XN-labels are further divided into those whose
>    remaining characters (after the "xn--") are valid output of the
>    Punycode algorithm [RFC3492] and those that are not (see below).  The
>    XN-labels that are valid Punycode output are known as "A-labels" if
>    they also meet the other criteria for IDNA-validity described below.
>    Because LDH labels (and, indeed, any DNS label) must not be more than
>    63 octets in length, the portion of an XN-label derived from the
>    Punycode algorithm is limited to no more than 59 ASCII characters.
>    Non-Reserved LDH labels are the set of valid LDH labels that do not
>    have "--" in the third and fourth positions.
>
> So, according to the above, NR-LDH labels never have -- in position 3
> and 4.  So ab--cd must be an R-LDH label.
>
> Of the R-LDH labels, only XN-labels are possibly A-labels.
>
> Only A-labels and U-labels are allowed under IDNA2008 (or NR-LDH
> labels, but those aren't actually subject to IDNA2008 of course).
>
> This is illustrated in Figure 1 in RFC 5890, although only fans of
> Venn diagrams (of which I am one) will find it helpful.

I don't see any of this reflected in RFC 5891.  As far as I can tell,
"ab--cd" is permitted since there is no rule to forbid it.

Is a new rule needed to forbid "ab--cd" in RFC 5891 or is there an error
in the existing "--" rule for U-labels, or something else?

/Simon


More information about the Idna-update mailing list