Unconditional punycode conversion

Andrew Sullivan ajs at shinkuro.com
Wed Mar 9 18:22:59 CET 2011


On Wed, Mar 09, 2011 at 06:13:57PM +0100, Simon Josefsson wrote:
> Andrew Sullivan <ajs at shinkuro.com> writes:
> 
> > On Wed, Mar 09, 2011 at 05:36:10PM +0100, Simon Josefsson wrote:
> >> To verify my understanding: the label "ab--cd" is permitted by IDNA2008
> >> despite it having "--" in the third and fourth characater positions?
> >> That would be because section 5.4 only applies to non-ascii labels.
> >
> > No.  See section 2.3.1 of RFC 5890.
> 
> I don't see any MUST/SHOULD language there.  RFC 5891 says:
> 
>    Putative U-labels with any of the
>    following characteristics MUST be rejected prior to DNS lookup:
> ...
>    o  Labels containing "--" (two consecutive hyphens) in the third and
>       fourth character positions.
> 
> Is "ab--cd" a putative U-label?

Certainly not.  It has no high bits.  But,

   To facilitate clear description, two new subsets of LDH labels are
   created by the introduction of IDNA.  These are called Reserved LDH
   labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
   Reserved LDH labels, known as "tagged domain names" in some other
   contexts, have the property that they contain "--" in the third and
   fourth characters but which otherwise conform to LDH label rules.
   Only a subset of the R-LDH labels can be used in IDNA-aware
   applications.  That subset consists of the class of labels that begin
   with the prefix "xn--" (case independent), but otherwise conform to
   the rules for LDH labels.  That subset is called "XN-labels" in this
   set of documents.  XN-labels are further divided into those whose
   remaining characters (after the "xn--") are valid output of the
   Punycode algorithm [RFC3492] and those that are not (see below).  The
   XN-labels that are valid Punycode output are known as "A-labels" if
   they also meet the other criteria for IDNA-validity described below.
   Because LDH labels (and, indeed, any DNS label) must not be more than
   63 octets in length, the portion of an XN-label derived from the
   Punycode algorithm is limited to no more than 59 ASCII characters.
   Non-Reserved LDH labels are the set of valid LDH labels that do not
   have "--" in the third and fourth positions.

So, according to the above, NR-LDH labels never have -- in position 3
and 4.  So ab--cd must be an R-LDH label.

Of the R-LDH labels, only XN-labels are possibly A-labels.

Only A-labels and U-labels are allowed under IDNA2008 (or NR-LDH
labels, but those aren't actually subject to IDNA2008 of course).

This is illustrated in Figure 1 in RFC 5890, although only fans of
Venn diagrams (of which I am one) will find it helpful.

A

-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.


More information about the Idna-update mailing list