[Errata Held for Document Update] RFC5890 (4824)

RFC Errata System rfc-editor at rfc-editor.org
Fri Oct 7 22:10:37 CEST 2016


The following errata report has been held for document update 
for RFC5890, "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework". 

--------------------------------------
You may review the report below and at:
http://www.rfc-editor.org/errata_search.php?rfc=5890&eid=4824

--------------------------------------
Status: Held for Document Update
Type: Technical

Reported by: Juan Altmayer Pizzorno <juan at sparkpost.com>
Date Reported: 2016-05-17
Held by: Alexey Melnikov (IESG)

Section: 4.2

Original Text
-------------
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 252 characters
(Unicode code points).

Corrected Text
--------------
Because A-labels (the form actually used in the
DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
in general, more compressed that UTF-16 or UTF-32), U-labels that
obey all of the relevant symmetry (and other) constraints of these
documents may be quite a bit longer, potentially up to 59 Unicode
code points, or up to 236 octets.

Notes
-----
(The same rationale as my report for 2.3.2.1 applies:)

The contents of U-labels are encoded in the up to 59 ASCII characters (see 2.3.2.1)
output by the Punycode algorithm in their corresponding A-labels.  The Punycode
decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at least one
of those ASCII characters for each code point inserted into the U-label. An U-label,
thus, can contain at the most 59 Unicode code points.

Since U-labels are defined (in 2.3.2.1) to be expressed in a standard Unicode Encoding
Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can encode a code
point in at most 4 octets, 236 octets is an upper bound for an U-label's length.

I think it should be possible to derive a tighter bound, but its rationale would likely be
less straighforward.

I imagine the number 252 was originally derived by multiplying 63, the maximum
length of an A-label (including the "xn--" prefix), by 4, the maximum number of
octets needed to represent a code point.

--------------------------------------
RFC5890 (draft-ietf-idnabis-defs-13)
--------------------------------------
Title               : Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework
Publication Date    : August 2010
Author(s)           : J. Klensin
Category            : PROPOSED STANDARD
Source              : Internationalized Domain Names in Applications (Revised)
Area                : Applications
Stream              : IETF
Verifying Party     : IESG



More information about the Idna-update mailing list