Unconditional punycode conversion
Andrew Sullivan
ajs at shinkuro.com
Wed Mar 9 18:22:59 CET 2011
On Wed, Mar 09, 2011 at 06:13:57PM +0100, Simon Josefsson wrote:
> Andrew Sullivan <ajs at shinkuro.com> writes:
>
> > On Wed, Mar 09, 2011 at 05:36:10PM +0100, Simon Josefsson wrote:
> >> To verify my understanding: the label "ab--cd" is permitted by IDNA2008
> >> despite it having "--" in the third and fourth characater positions?
> >> That would be because section 5.4 only applies to non-ascii labels.
> >
> > No. See section 2.3.1 of RFC 5890.
>
> I don't see any MUST/SHOULD language there. RFC 5891 says:
>
> Putative U-labels with any of the
> following characteristics MUST be rejected prior to DNS lookup:
> ...
> o Labels containing "--" (two consecutive hyphens) in the third and
> fourth character positions.
>
> Is "ab--cd" a putative U-label?
Certainly not. It has no high bits. But,
To facilitate clear description, two new subsets of LDH labels are
created by the introduction of IDNA. These are called Reserved LDH
labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
Reserved LDH labels, known as "tagged domain names" in some other
contexts, have the property that they contain "--" in the third and
fourth characters but which otherwise conform to LDH label rules.
Only a subset of the R-LDH labels can be used in IDNA-aware
applications. That subset consists of the class of labels that begin
with the prefix "xn--" (case independent), but otherwise conform to
the rules for LDH labels. That subset is called "XN-labels" in this
set of documents. XN-labels are further divided into those whose
remaining characters (after the "xn--") are valid output of the
Punycode algorithm [RFC3492] and those that are not (see below). The
XN-labels that are valid Punycode output are known as "A-labels" if
they also meet the other criteria for IDNA-validity described below.
Because LDH labels (and, indeed, any DNS label) must not be more than
63 octets in length, the portion of an XN-label derived from the
Punycode algorithm is limited to no more than 59 ASCII characters.
Non-Reserved LDH labels are the set of valid LDH labels that do not
have "--" in the third and fourth positions.
So, according to the above, NR-LDH labels never have -- in position 3
and 4. So ab--cd must be an R-LDH label.
Of the R-LDH labels, only XN-labels are possibly A-labels.
Only A-labels and U-labels are allowed under IDNA2008 (or NR-LDH
labels, but those aren't actually subject to IDNA2008 of course).
This is illustrated in Figure 1 in RFC 5890, although only fans of
Venn diagrams (of which I am one) will find it helpful.
A
--
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.
More information about the Idna-update
mailing list