Unconditional punycode conversion
simon at josefsson.org
Wed Mar 9 19:08:45 CET 2011
Andrew Sullivan <ajs at shinkuro.com> writes:
> On Wed, Mar 09, 2011 at 06:13:57PM +0100, Simon Josefsson wrote:
>> Andrew Sullivan <ajs at shinkuro.com> writes:
>> > On Wed, Mar 09, 2011 at 05:36:10PM +0100, Simon Josefsson wrote:
>> >> To verify my understanding: the label "ab--cd" is permitted by IDNA2008
>> >> despite it having "--" in the third and fourth characater positions?
>> >> That would be because section 5.4 only applies to non-ascii labels.
>> > No. See section 2.3.1 of RFC 5890.
>> I don't see any MUST/SHOULD language there. RFC 5891 says:
>> Putative U-labels with any of the
>> following characteristics MUST be rejected prior to DNS lookup:
>> o Labels containing "--" (two consecutive hyphens) in the third and
>> fourth character positions.
>> Is "ab--cd" a putative U-label?
> Certainly not. It has no high bits. But,
> To facilitate clear description, two new subsets of LDH labels are
> created by the introduction of IDNA. These are called Reserved LDH
> labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
> Reserved LDH labels, known as "tagged domain names" in some other
> contexts, have the property that they contain "--" in the third and
> fourth characters but which otherwise conform to LDH label rules.
> Only a subset of the R-LDH labels can be used in IDNA-aware
> applications. That subset consists of the class of labels that begin
> with the prefix "xn--" (case independent), but otherwise conform to
> the rules for LDH labels. That subset is called "XN-labels" in this
> set of documents. XN-labels are further divided into those whose
> remaining characters (after the "xn--") are valid output of the
> Punycode algorithm [RFC3492] and those that are not (see below). The
> XN-labels that are valid Punycode output are known as "A-labels" if
> they also meet the other criteria for IDNA-validity described below.
> Because LDH labels (and, indeed, any DNS label) must not be more than
> 63 octets in length, the portion of an XN-label derived from the
> Punycode algorithm is limited to no more than 59 ASCII characters.
> Non-Reserved LDH labels are the set of valid LDH labels that do not
> have "--" in the third and fourth positions.
> So, according to the above, NR-LDH labels never have -- in position 3
> and 4. So ab--cd must be an R-LDH label.
> Of the R-LDH labels, only XN-labels are possibly A-labels.
> Only A-labels and U-labels are allowed under IDNA2008 (or NR-LDH
> labels, but those aren't actually subject to IDNA2008 of course).
> This is illustrated in Figure 1 in RFC 5890, although only fans of
> Venn diagrams (of which I am one) will find it helpful.
I don't see any of this reflected in RFC 5891. As far as I can tell,
"ab--cd" is permitted since there is no rule to forbid it.
Is a new rule needed to forbid "ab--cd" in RFC 5891 or is there an error
in the existing "--" rule for U-labels, or something else?
More information about the Idna-update