Potential Erratum re. length limits in RFC 5890
John C Klensin
klensin at jck.com
Thu Sep 30 04:49:58 CEST 2010
--On Wednesday, September 29, 2010 18:31 -0700 Mark Davis ☕
<mark at macchiato.com> wrote:
> Ken is right about the maximal source label length being at
> least 252 in the absence of mapping.
> With the use of mapping, however, it could be substantially
> longer. This can happen a series of characters in the source
> can map to a single character, and then are mapped to a single
> byte in Punycode. That can happen with IDNA2008, or with UTS46
> (or any other mapping preprocessing for IDNA2008).
> So it is best to just avoid a mention of a limit like 252;
> either that or explain the situation in more detail.
Thanks for the explanation and illustration. Do note two things:
(1) We aren't discussing a document that is still in development
here, but a published RFC. While I can imagine the document
being "updated" (really replaced by a new version) at some
point, I doubt that it will be any time soon unless we discover
a really severe problem. Because the present comment was more
or less an aside, it would be hard to claim it is that severe.
So the most that it is possible to do at this stage is to file
an erratum. Not only do few people look at those in practice,
but, for something like this, the most likely status for it is,
more or less, "note to be considered when the document is next
(2) The text of RFC 5890 at that point is, unless my memory has
gone seriously bad, strictly discussing U-labels and A-labels.
A U-label involves absolutely no mapping: it is required to be
in NFC form, but there isn't even mapping _to_ NFC form. So,
using your example, consider
> 00 00 00 41 00 00 03 08 00 00 03 04
> That sequence, when normalized to NFC, yields
> U+01DE ( Ǟ ) LATIN CAPITAL LETTER A WITH DIAERESIS AND
> MACRON, one character.
Yes. But, because that sequence is not in NFC form, it isn't a
valid U-label. So the example is interesting --and very
important to mapping approaches like those outlined in RFC 5895
or UTR 46-- but it is irrelevant to RFC 5890.
More information about the Idna-update