Potential Erratum re. length limits in RFC 5890

Mark Davis ☕ mark at macchiato.com
Thu Sep 30 23:42:00 CEST 2010


Thanks. I agree that it isn't relevant to U-Labels.

Mark

*— Il meglio è l’inimico del bene —*


On Wed, Sep 29, 2010 at 19:49, John C Klensin <klensin at jck.com> wrote:

>
> --On Wednesday, September 29, 2010 18:31 -0700 Mark Davis ☕
> <mark at macchiato.com> wrote:
>
> > Ken is right about the maximal source label length being at
> > least 252 in the absence of mapping.
> >
> > With the use of mapping, however, it could be substantially
> > longer. This can happen a series of characters in the source
> > can map to a single character, and then are mapped to a single
> > byte in Punycode. That can happen with IDNA2008, or with UTS46
> > (or any other mapping preprocessing for IDNA2008).
> >
> > So it is best to just avoid a mention of a limit like 252;
> > either that or explain the situation in more detail.
> >...
>
> Mark,
>
> Thanks for the explanation and illustration.  Do note two things:
>
> (1) We aren't discussing a document that is still in development
> here, but a published RFC.  While I can imagine the document
> being "updated" (really replaced by a new version) at some
> point, I doubt that it will be any time soon unless we discover
> a really severe problem.   Because the present comment was more
> or less an aside, it would be hard to claim it is that severe.
> So the most that it is possible to do at this stage is to file
> an erratum.  Not only do few people look at those in practice,
> but, for something like this, the most likely status for it is,
> more or less, "note to be considered when the document is next
> revised".
>
> (2) The text of RFC 5890 at that point is, unless my memory has
> gone seriously bad, strictly discussing U-labels and A-labels.
> A U-label involves absolutely no mapping: it is required to be
> in NFC form, but there isn't even mapping _to_ NFC form.    So,
> using your example,  consider
>
> > 00 00 00 41 00 00 03 08 00 00 03 04
> >
> > That sequence, when normalized to NFC, yields
> >
> > U+01DE ( Ǟ ) LATIN CAPITAL LETTER A WITH DIAERESIS AND
> > MACRON, one character.
>
> Yes.  But, because that sequence is not in NFC form, it isn't a
> valid U-label.  So the example is interesting --and very
> important to mapping approaches like those outlined in RFC 5895
> or UTR 46-- but it is irrelevant to RFC 5890.
>
>     john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20100930/16838b3c/attachment.html>


More information about the Idna-update mailing list