Definitions limit on label length in UTF-8
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Sep 15 03:57:58 CEST 2009
On 2009/09/14 23:47, John C Klensin wrote:
>
> --On Monday, September 14, 2009 12:24 +0200 Harald Alvestrand
> <harald at alvestrand.no> wrote:
>> Documenting these 3 numbers as "an U-label can't get longer
>> than that and fit into an A-label" seems sufficient to avoid
>> the spectre of "unlimited length" to me.
>
> While people will probably want to debate the precise way I
> handled this, I simply documented the maximum (252) in Defs-11
> (now posted). The reasons were:
>
> (i) I didn't think much would be served by the added complexity.
> Those who want to try to save a few octets can make their own
> calculations.
Fine with me.
> (ii) I think Martin's calculation may be wrong. For example, if
> one built a label entirely with characters that require
> surrogate pairs, UTF-16 and UTF-32 are the same length.
For those labels that require surrogate pairs, indeed, the lengths are
the same (namely max. 224 octets). The longer overall length limit for
UTF-32 stems from the fact that even US-ASCII characters take four bytes
in UTF-32.
Of course, there is still some possiblity that there's an error in my
calculations. Cross-checking would be welcome.
As for limits in codepoints, that limit is 63 codepoints. But in all
cases, these limits only apply to valid Unicdoe, not to stuff before
mapping.
Regards, Martin.
--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update
mailing list