Definitions limit on label length in UTF-8

John C Klensin klensin at jck.com
Fri Sep 11 17:47:09 CEST 2009



--On Friday, September 11, 2009 17:37 +0900 "\"Martin J.
Dürst\"" <duerst at it.aoyama.ac.jp> wrote:

> 
>> (John claimed that the email context required such a
>> rule, but I did not bother to confirm that.)
> 
> Given dinosaur implementations such as sendmail, I can
> understand the concern that some SMTP implementations may not
> easily be upgradable to use domain names with more than 255
> octets or labels with more than 63 octets. In than case, I
> would have expected at least a security warning at
> http://tools.ietf.org/html/rfc4952#section-9 (EAI is currently
> written in terms of IDNA2003, and so there are no length
> restrictions on U-labels).

I obviously have not been explaining this very well.  The
problem is not "dinosaur implementations" but a combination of
two things (which interact):

(1) Late resolution of strings, possibly through APIs that
resolve names in places that may not be the public DNS.
Systems using those APIs may keep strings in UTF-8 until very
late in the process, even passing the UTF-8 strings into the
interface or converting them to ACE form just before calling the
interface.  Either way, because other systems have come to rely
on the 63 octet limit, strings longer than 63 characters pose a
risk of unexpected problems.  The issues with this are better
explained in draft-iab-idn-encoding-00.txt, which I would
strongly encourage people in this WG to go read.

(2) The "conversion of DNS name formats" issue that has been
extensively discussed as part of the question of alternate label
separators (sometimes described in our discussions as
"dot-oids").  Applications that use domain names, including
domain names that are not going to be resolved (or even looked
up), must be able to freely and accurately converted between
DNS-external (dot-separated labels) and DNS-internal
(length-string pairs) formats _without_ knowing whether they are
IDNs or not.  As discussed earlier, one of several reasons for
that requirement is that, in non-IDNA-aware contexts, labels in
non-IDNA-aware applications or contexts may be perfectly valid
as far as the DNS is concerned, because the only restriction the
DNS (and the normal label type) imposes is "octets".  That
length-string format has a hard limit of 63 characters that can
be exceeded only if one can figure out how to get a larger
number into six bits (see RFC1035, first paragraph of Section
3.1, and elsewhere).  If we permit longer U-label strings on the
theory that the only important restriction is on A-labels, we
introduce new error states into the format conversion process.

If this needs more explanation somewhere (possibly in
Rationale), I'm happy to try to do that.  But I think
eliminating the restriction would cause far more problems than
it is worth.

I note that, while I haven't had time to respond, some of the
discussion on the IRI list has included an argument that domain
names in URIs cannot be restricted to A-label forms but must
include %-escaped UTF-8 simply because those strings might not
be public-DNS domain names but references to some other database
or DNS environment.   It seems to me that one cannot have it
both ways -- either the application knows whether a string is a
public DNS reference that must conform _only_ to IDNA
requirements (but then can be restricted to A-labels) or the
application does not know and therefore must conform to DNS
requirements for label lengths.   For our purposes, the only
sensible way, at least IMO, to deal with this is to require
conformance to both sets of rules, i.e., 63 character maximum
for A-labels and 63 character maximum for U-labels.

   john



More information about the Idna-update mailing list