Potential Erratum re. length limits in RFC 5890

John C Klensin klensin at jck.com
Tue Sep 28 15:20:21 CEST 2010



--On Tuesday, September 28, 2010 06:07 -0400 Vint Cerf
<vint at google.com> wrote:

> martin, i see the point and i appears to me that the
> "characters" language was intended to mean "octets" as you say.
> 
> does anyone else disagree?

We actually discussed this at some length and, I thought,
reached fairly general agreement.  In one way of looking at
things, there is no restriction, certainly no DNS restriction,
on the length of U-labels at all (in any units) -- as far as the
DNS is concerned, U-labels are an abstraction that is irrelevant
to the DNS itself.  Limits on the length of FQDN-like structures
that contain U-labels may be imposed by other protocols such as
the IRI one or EAI work, but are not DNS requirements.  Such
limits may also be imposed by display limitations, at which
point there may be a relationship to those DNS limits.  Remember
too that the IDNA2008 specs were deliberately written to say as
little as possible about full domain names rather than labels
and to be relatively agnostic about how characters were encoded.

So...

(1) If we have an actual DNS FQDN (a mixture of zero or more
each of traditional LDH labels, A-labels, and arbitrary octets),
then the limit is 63 octets per label and 252 octets for the
domain name.  That limit is imposed by RFC 1034/1035; if the
IDNA documents say anything about it, it is just repetition for
clarity.

(2) If we have, to coin yet another term, an IDN-ized FQDN (a
mixture of the above and U-labels, possibly with additional
restrictions), then we have, to preserve the DNS relationship,
an upper limit of 63 _characters_ per label (with the
understanding that the term "character" is a little ambiguous
when a single user-abstract character are constructed from base
characters and zero or more combining characters).  A 63
user-abstract character limit is an upper limit that is unlikely
to be reached if non-ASCII characters are present (impossible in
the Unicode encoding), but the WG strongly rejected earlier text
that imposed octet limits on the length of U-labels.

(3) My recollection is that the 252 number came from Ken or Mark
after discussion of the number of code points 63 user-abstract
characters could turn into given combining forms.  The statement
in the text was written --again, IIR after considerable WG
discussion-- as advice about how long the strings could get, not
a normative limit.    At a minimum, I'd like to see if they can
reconstruct the reasoning for that number, or if someone has the
energy to search the discussion archives, before issuing any
errata.  

  best,
   john




> On Tue, Sep 28, 2010 at 5:28 AM, "Martin J. Dürst"
> <duerst at it.aoyama.ac.jp>wrote:
> 
>> In section 4.2, RFC 5890 says
>> (http://tools.ietf.org/html/rfc5890#section-4.2):
>> 
>>                     Because A-labels (the form actually used
>>                     in the
>>   DNS) are potentially much more compressed than UTF-8 (and
>>   UTF-8 is, in general, more compressed that UTF-16 or
>>   UTF-32), U-labels that obey all of the relevant symmetry
>>   (and other) constraints of these documents may be quite a
>>   bit longer, potentially up to 252 characters (Unicode code
>>   points).
>> 
>> I'm not at all sure where the number of 252 characters is
>> coming from. It does not at all match the justification given
>> here. Punycode is a very clever encoding scheme, but it never
>> compresses any character to less than a full ASCII character.
>> So on this ground, it is impossible to squeeze more than 63
>> characters into a label.
>> 
>> The number may have come from my mail entitled
>> Re: Definitions limit on label length in UTF-8
>> with date and message id
>> Date: Sun, 13 Sep 2009 17:05:58 +0900
>> Message-ID: <4AACA7E6.1070503 at it.aoyama.ac.jp>
>> 
>> where 252 is indeed the largest number appearing, but
>> considerations were very carefully done in *octets*
>> throughout.
>> 
>> So I think we should submit an erratum to fix this to 252
>> octets.
>> 
>> Regards,    Martin.
>> 
>> --
>> # -# Martin J. Dürst, Professor, Aoyama Gakuin University
>> # -# http://www.sw.it.aoyama.ac.jp
>> # mailto:duerst at it.aoyama.ac.jp
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>> 






More information about the Idna-update mailing list