Potential Erratum re. length limits in RFC 5890

Jiankang YAO yaojk at cnnic.cn
Thu Sep 30 03:39:50 CEST 2010


+1.

good point: "
So it is best to just avoid a mention of a limit like 252; either that or explain the situation in more detail.
"

good example: Repeat it(U+01DE ( Ǟ ) ) 57 times. That is of length 684.


Jiankang Yao

  ----- Original Message ----- 
  From: Mark Davis ☕ 
  To: John C Klensin 
  Cc: Markus Scherer ; idna-update at alvestrand.no ; Kenneth Whistler 
  Sent: Thursday, September 30, 2010 9:31 AM
  Subject: Re: Potential Erratum re. length limits in RFC 5890


  Ken is right about the maximal source label length being at least 252 in the absence of mapping. 


  With the use of mapping, however, it could be substantially longer. This can happen a series of characters in the source can map to a single character, and then are mapped to a single byte in Punycode. That can happen with IDNA2008, or with UTS46 (or any other mapping preprocessing for IDNA2008).


  So it is best to just avoid a mention of a limit like 252; either that or explain the situation in more detail.


  ====


  Details. As illustration, suppose that you had the following, in UTF32.


  00 00 00 41 00 00 03 08 00 00 03 04


  That sequence, when normalized to NFC, yields 


  U+01DE ( Ǟ ) LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON, one character. 


  Repeat it 57 times. That is of length 684.


  When normalized under NFC, you get 


  ǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞǞ



  That turns into the valid Punycode:


  xn--bkaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa


  Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20100930/db367f74/attachment-0001.html>


More information about the Idna-update mailing list