Definitions limit on label length in UTF-8

John C Klensin klensin at jck.com
Mon Sep 14 16:47:20 CEST 2009



--On Monday, September 14, 2009 12:24 +0200 Harald Alvestrand
<harald at alvestrand.no> wrote:

> Martin J. Dürst wrote:
>> Overall, we get a maximum label length in octets of 252
>> octets for  UTF-32 (with US-ASCII), and 224 octets in UTF-8
>> and UTF-16 (with Old  Italic and the like).
>>   
> Documenting these 3 numbers as "an U-label can't get longer
> than that and fit into an A-label" seems sufficient to avoid
> the spectre of "unlimited length" to me.

While people will probably want to debate the precise way I
handled this, I simply documented the maximum (252) in Defs-11
(now posted).  The reasons were:

(i) I didn't think much would be served by the added complexity.
Those who want to try to save a few octets can make their own
calculations.

(ii) I think Martin's calculation may be wrong.  For example, if
one built a label entirely with characters that require
surrogate pairs, UTF-16 and UTF-32 are the same length.      

> People with more
> expertise in actually writing the code to handle these names
> will have better info.

Yes

>...

   john



More information about the Idna-update mailing list