Potential Erratum re. length limits in RFC 5890

John C Klensin klensin at jck.com
Wed Sep 29 13:27:51 CEST 2010


Thanks.
   john


--On Tuesday, September 28, 2010 15:02 -0700 Kenneth Whistler
<kenw at sybase.com> wrote:

> John Klensin said:
> 
>> (3) My recollection is that the 252 number came from Ken
> 
> Not me.
> 
>> or Mark
>> after discussion of the number of code points 63 user-abstract
>> characters could turn into given combining forms.
> 
> It has nothing to do with combining characters.
> 
>> The statement
>> in the text was written --again, IIR after considerable WG
>> discussion-- as advice about how long the strings could get,
>> not a normative limit.    At a minimum, I'd like to see if
>> they can reconstruct the reasoning for that number, 
> 
> The reasoning is quite simple. It has to do with Unicode
> encoding forms (and again, nothing whatsoever to do with
> combining characters).
> 
> 63 encoded characters (Unicode code points) have the
> following minimum and maximum lengths (expressed in octets),
> depending on encoding forms and which particular characters are
> involved.
> 
> For 63 characters in the ASCII range (U+0020..U+007E)
> 
>    UTF-8  =  63 octets
>    UTF-16 = 126 octets
>    UTF-32 = 252 octets
>    
> For 63 character from the supplementary planes (U+10000 and
> above)
> 
>    UTF-8  = 252 octets
>    UTF-16 = 252 octets
>    UTF-32 = 252 octets
>    
> Those are the minimum and maximum cases. For some more
> typical mix of characters from the BMP, the UTF-8 length
> will be >= 63 and <= 252 octets.
>    
> That's it... no mumbo-jumbo involved about what a user
> perceives of as a character or what number of combining
> characters can be applied to a base character or any of that.
> 
> --Ken   
> 
>> or if someone has the
>> energy to search the discussion archives, before issuing any
>> errata.  
> 






More information about the Idna-update mailing list