referencing IDNA2008 (and IDNA2003?)
John C Klensin
klensin at jck.com
Thu Oct 21 19:56:30 CEST 2010
First, Alexey and Peter: as you are aware, this is at least the
third time lately that the question of "how to reference
IDNA2008 in context from a document that originally referenced
IDNA2003" has come up. We probably need to figure out how to
establish an Area-wide (or IETF-wide) strategy, rather than
going through it one WG or document at a time.
Comments on Jeff's note/proposal inline below...
--On Thursday, October 21, 2010 10:12 -0700 "=JeffH"
<Jeff.Hodges at KingsMountain.com> wrote:
> In the httpstate, we've almost completed our spec on HTTP
> Cookies (as they actually are implemented & deployed). In the
> process, we've attempted to properly reference the IDNA specs,
> but of course with the recent publication of IDNA2008 and its
> obsoleting IDNA2003, this presented a bit of head-scratching
> WRT how to properly reference them since IDNA2008 is not
> backwards compatible and there's going to be a transition
> period for some time.
> Below's what we came up with (the following are relevant
> excerpts from
> How does that look to you folks?
> [ httpstate chair & document shepherd ]
> 5.1.2. Canonicalized host names
> A canonicalized host name is the string generated by the
> 1. Convert the host name to a sequence of NR-LDH labels
> (see Section
> 22.214.171.124 of [RFC5890]) and/or A-labels according to the
> appropriate IDNA specification [RFC5891] or [RFC3490]
> Section 6.3 of this specification)
I don't have time to look right now, but you probably don't want
"NR-LDH labels". I will try to check in the next day or so if
no one else gets to it.
I'll let Andrew Sullivan or something else comment further, but
"host name" may or may not mean what you think it does.
> 2. Convert the labels to lower case.
I was initially confused by that. I think you want "Convert the
labels resulting from Step 1" or something like that to make it
absolutely clear that you are not talking about a lower-case
conversion of non-ASCII strings.
> 3. Concatenate the labels, separating each label from the
> next with
> a %x2E (".") character.
> 6.3. IDNA dependency and migration
> IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
> backwards-compatible. For this reason, there will be a
> period (possibly of a number of years). User agents
> SHOULD implement
> IDNA2008 [RFC5890] and MAY implement [Unicode Technical
> Standard #46
> <http://unicode.org/reports/tr46/>] in order to facilitate
> a smoother
> IDNA transition. If a user agent does not implement
> IDNA2008, the
> user agent MUST implement IDNA2003 [RFC3490].
That paragraph has the odor of FUD. Please understand that, at
a 10000 meter level, the number of practical incompatibilities
between IDNA2003 and IDNA2008 are very few. In particular:
-- Strings containing symbols, punctuation, etc., are generally
invalid under IDNA2008 and were generally valid under IDNA2003.
You might plausibly want to recommend that an implementation
ignore IDNA2008 and look strings containing them up anyway (I
might oppose that, but it doesn't make it less plausible) but
various mapping strategies have nothing to do with that. FWIW,
IDNA2008 actually permits those lookups if the application
receives an A-label. Of course, a subset of such strings may
present more opportunities for attacks on users than the
character confusion that gets all the attention, so it might not
be a wildly good idea to accept them.
-- Whether or not mapping occurs is not an incompatibility
between IDNA2003 and IDNA2008. IDNA2003 requires a particular
mapping; IDNA2008 permits, but does not require, mapping. If
you decide to map, there are at least two sets of
recommendations as to how to do so: the ones represented in UTR
46 and the ones represented in RFC 3895. If you choose one or
the other, I think you had best be prepared to defend the choice
during Last Call. IMO, "MAY implement either UTF 46 or RFC
3895" would be much more appropriate than ignoring one or the
-- There are issues involving characters that IDNA2003 mapped to
nothing, particularly ZWJ and ZWNJ. You can't have it both
ways: if you choose the "map to nothing" option, names that are
legitimate for registration under IDNA2008 (and that involve
important distinctions in some languages) become inaccessible
and you risk some false positives with all of the advantages
that provides for attackers. My own recommendation would be
to adopt the IDNA2008 handling for those characters as soon as
feasible regardless of whether one implements the rest of
IDNA2008 or not, regardless of whether one applies some
particular set of mappings or not, etc.
-- There are two other problem characters that are interpreted
differently in U-label to A-label conversion in IDNA2003 and
IDNA2008. If you are going to take a position on how they
should be handled, IMO you really should discuss the issues (or
point to something that does) not do some handwaving about UTR
More information about the Idna-update