Definitions limit on label length in UTF-8
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Sep 15 03:57:58 CEST 2009
On 2009/09/14 23:47, John C Klensin wrote:
> --On Monday, September 14, 2009 12:24 +0200 Harald Alvestrand
> <harald at alvestrand.no> wrote:
>> Documenting these 3 numbers as "an U-label can't get longer
>> than that and fit into an A-label" seems sufficient to avoid
>> the spectre of "unlimited length" to me.
> While people will probably want to debate the precise way I
> handled this, I simply documented the maximum (252) in Defs-11
> (now posted). The reasons were:
> (i) I didn't think much would be served by the added complexity.
> Those who want to try to save a few octets can make their own
Fine with me.
> (ii) I think Martin's calculation may be wrong. For example, if
> one built a label entirely with characters that require
> surrogate pairs, UTF-16 and UTF-32 are the same length.
For those labels that require surrogate pairs, indeed, the lengths are
the same (namely max. 224 octets). The longer overall length limit for
UTF-32 stems from the fact that even US-ASCII characters take four bytes
Of course, there is still some possiblity that there's an error in my
calculations. Cross-checking would be welcome.
As for limits in codepoints, that limit is 63 codepoints. But in all
cases, these limits only apply to valid Unicdoe, not to stuff before
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update