Potential Erratum re. length limits in RFC 5890

Kenneth Whistler kenw at sybase.com
Wed Sep 29 00:23:52 CEST 2010


------------- Begin Forwarded Message -------------

Date: Tue, 28 Sep 2010 15:22:31 -0700 (PDT)
From: Kenneth Whistler <kenw at atlantis-new.sybase.com>
Subject: Re: Potential Erratum re. length limits in RFC 5890
To: markus.icu at gmail.com
Cc: idna at alvestrand.no, kenw at birdie.sybase.com
Content-MD5: 02NXPA2scgCAiVtnympnAQ==

Markus Scherer wrote:

> > Those are the minimum and maximum cases. For some more
> > typical mix of characters from the BMP, the UTF-8 length
> > will be >= 63 and <= 252 octets.
> 
> Correct. (Or actually, with a "mix of characters from the BMP" you would get
> at most 189 UTF-8 bytes.)

Yes, of course. I was just quoting the outside limits again.
 
> However, Martin quoted the RFC as saying
> 
>   [...] U-labels that
>   obey all of the relevant symmetry (and other) constraints of these
>   documents may be quite a bit longer, potentially up to 252 characters
>   (Unicode code points).
> 
> How to 252 *Unicode code points* relate to the A-label length limit of 63
> *octets* (where each is an ASCII letter or digit)?
> (Aside from the xn-- prefix which reduces the label contents to 59 octets.)

I was (deliberately) not addressing the issue of the error in
the RFC text reported by Martin.

I was only responding to the item in John Klensin's email
apparently addressed to myself and Mark requesting a
recovery of the reasoning for where the number "252"
came from. (And clarifying that the reasoning for that
number had nothing to do with the arcanities of combining
character sequences and user-perceived character identities
and the like.)

--Ken



------------- End Forwarded Message -------------




More information about the Idna-update mailing list