Definitions limit on label length in UTF-8 (was: Re: Fwd: ICANN News Alert -- Status Update: IDN ccTLD Fast Track Process Implementation)

John C Klensin klensin at jck.com
Thu Sep 10 18:08:36 CEST 2009



--On Thursday, September 10, 2009 11:31 -0400 Andrew Sullivan
<ajs at shinkuro.com> wrote:

> On Thu, Sep 10, 2009 at 07:25:25PM +0900, "Martin J. Dürst"
> wrote:
>> There is at least one big issue in there, namely the issue of
>> limiting  the length of labels by measuring their length in
>> UTF-8. I very much  hope this issue can be fixed asap.
> 
> I think I understand your objection, but I'm surprised that
> you think it is totally new.  I just looked, and the same
> basic text is in the -00 draft of definitions, which appeared
> in October of 2008.
> 
> What length restriction would you prefer instead?  I suspect
> the reason for the restriction is that a "domain name label
> slot" in most applications is 63 octets long.

FWIW, that was exactly the concern that motivated the text
(which, I believe, was actually in Rationale before the text was
pulled into Definitions-00).  We are expecting applications to
be able to switch freely back and forth between U-labels and
A-labels in the same "slots" (or buffers, or whatever word one
wants to use).  I haven't done the arithmetic, but I strongly
suspect that, if one ended up with a label consisting of code
points from plane 1 or above that were close together (those
code points occupy four octets each in UTF-8), the compactness
of Punycode encoding could result in a UTF-8 string that was
longer than the ACE.  And that, in turn, could lead to all sorts
of practical problems if we remove the dual test for 63 octets.

I'm agnostic as to whether this needs to be explained better (in
either Definitions or Rationale), but (speaking personally, not
as editor) I would be extremely hesitant to change the
restriction.

    john



More information about the Idna-update mailing list