Definitions limit on label length in UTF-8
John C Klensin
klensin at jck.com
Mon Sep 14 16:47:20 CEST 2009
--On Monday, September 14, 2009 12:24 +0200 Harald Alvestrand
<harald at alvestrand.no> wrote:
> Martin J. Dürst wrote:
>> Overall, we get a maximum label length in octets of 252
>> octets for UTF-32 (with US-ASCII), and 224 octets in UTF-8
>> and UTF-16 (with Old Italic and the like).
> Documenting these 3 numbers as "an U-label can't get longer
> than that and fit into an A-label" seems sufficient to avoid
> the spectre of "unlimited length" to me.
While people will probably want to debate the precise way I
handled this, I simply documented the maximum (252) in Defs-11
(now posted). The reasons were:
(i) I didn't think much would be served by the added complexity.
Those who want to try to save a few octets can make their own
(ii) I think Martin's calculation may be wrong. For example, if
one built a label entirely with characters that require
surrogate pairs, UTF-16 and UTF-32 are the same length.
> People with more
> expertise in actually writing the code to handle these names
> will have better info.
More information about the Idna-update