Potential Erratum re. length limits in RFC 5890
John C Klensin
klensin at jck.com
Wed Sep 29 13:27:51 CEST 2010
Thanks.
john
--On Tuesday, September 28, 2010 15:02 -0700 Kenneth Whistler
<kenw at sybase.com> wrote:
> John Klensin said:
>
>> (3) My recollection is that the 252 number came from Ken
>
> Not me.
>
>> or Mark
>> after discussion of the number of code points 63 user-abstract
>> characters could turn into given combining forms.
>
> It has nothing to do with combining characters.
>
>> The statement
>> in the text was written --again, IIR after considerable WG
>> discussion-- as advice about how long the strings could get,
>> not a normative limit. At a minimum, I'd like to see if
>> they can reconstruct the reasoning for that number,
>
> The reasoning is quite simple. It has to do with Unicode
> encoding forms (and again, nothing whatsoever to do with
> combining characters).
>
> 63 encoded characters (Unicode code points) have the
> following minimum and maximum lengths (expressed in octets),
> depending on encoding forms and which particular characters are
> involved.
>
> For 63 characters in the ASCII range (U+0020..U+007E)
>
> UTF-8 = 63 octets
> UTF-16 = 126 octets
> UTF-32 = 252 octets
>
> For 63 character from the supplementary planes (U+10000 and
> above)
>
> UTF-8 = 252 octets
> UTF-16 = 252 octets
> UTF-32 = 252 octets
>
> Those are the minimum and maximum cases. For some more
> typical mix of characters from the BMP, the UTF-8 length
> will be >= 63 and <= 252 octets.
>
> That's it... no mumbo-jumbo involved about what a user
> perceives of as a character or what number of combining
> characters can be applied to a base character or any of that.
>
> --Ken
>
>> or if someone has the
>> energy to search the discussion archives, before issuing any
>> errata.
>
More information about the Idna-update
mailing list