referencing IDNA2008 (and IDNA2003?)
ietf at adambarth.com
Thu Oct 21 21:25:29 CEST 2010
On Thu, Oct 21, 2010 at 12:02 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
> On 10/21/10 11:56 AM, John C Klensin wrote:
>> First, Alexey and Peter: as you are aware, this is at least the
>> third time lately that the question of "how to reference
>> IDNA2008 in context from a document that originally referenced
>> IDNA2003" has come up. We probably need to figure out how to
>> establish an Area-wide (or IETF-wide) strategy, rather than
>> going through it one WG or document at a time.
>> Comments on Jeff's note/proposal inline below...
>> --On Thursday, October 21, 2010 10:12 -0700 "=JeffH"
>> <Jeff.Hodges at KingsMountain.com> wrote:
>>> In the httpstate, we've almost completed our spec on HTTP
>>> Cookies (as they actually are implemented & deployed). In the
>>> process, we've attempted to properly reference the IDNA specs,
>>> but of course with the recent publication of IDNA2008 and its
>>> obsoleting IDNA2003, this presented a bit of head-scratching
>>> WRT how to properly reference them since IDNA2008 is not
>>> backwards compatible and there's going to be a transition
>>> period for some time.
>>> Below's what we came up with (the following are relevant
>>> excerpts from
>>> How does that look to you folks?
>>> [ httpstate chair & document shepherd ]
>>> 5.1.2. Canonicalized host names
>>> A canonicalized host name is the string generated by the
>>> 1. Convert the host name to a sequence of NR-LDH labels
>>> (see Section
>>> 22.214.171.124 of [RFC5890]) and/or A-labels according to the
>>> appropriate IDNA specification [RFC5891] or [RFC3490]
>>> Section 6.3 of this specification)
>> I don't have time to look right now, but you probably don't want
>> "NR-LDH labels". I will try to check in the next day or so if
>> no one else gets to it.
> I think this spec refers to NR-LDH labels here because A-labels are for
> use by IDNs, but if the input to this canonicalization algorithm is a
> traditional domain name (containing no IDN labels) then there is no need
> to prefix the string with "xn--" on the way to producing an A-label --
> if you don't need an A-label, an NR-LDH label will do.
>> I'll let Andrew Sullivan or something else comment further, but
>> "host name" may or may not mean what you think it does.
> It never does.
>>> 2. Convert the labels to lower case.
>> I was initially confused by that. I think you want "Convert the
>> labels resulting from Step 1" or something like that to make it
>> absolutely clear that you are not talking about a lower-case
>> conversion of non-ASCII strings.
> Yes, that's better.
>>> 3. Concatenate the labels, separating each label from the
>>> next with
>>> a %x2E (".") character.
>>> 6.3. IDNA dependency and migration
>>> IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
>>> backwards-compatible. For this reason, there will be a
>>> period (possibly of a number of years). User agents
>>> SHOULD implement
>>> IDNA2008 [RFC5890] and MAY implement [Unicode Technical
>>> Standard #46
>>> <http://unicode.org/reports/tr46/>] in order to facilitate
>>> a smoother
>>> IDNA transition. If a user agent does not implement
>>> IDNA2008, the
>>> user agent MUST implement IDNA2003 [RFC3490].
>> That paragraph has the odor of FUD. Please understand that, at
>> a 10000 meter level, the number of practical incompatibilities
>> between IDNA2003 and IDNA2008 are very few. In particular:
>> -- Strings containing symbols, punctuation, etc., are generally
>> invalid under IDNA2008 and were generally valid under IDNA2003.
>> You might plausibly want to recommend that an implementation
>> ignore IDNA2008 and look strings containing them up anyway (I
>> might oppose that, but it doesn't make it less plausible) but
>> various mapping strategies have nothing to do with that. FWIW,
>> IDNA2008 actually permits those lookups if the application
>> receives an A-label. Of course, a subset of such strings may
>> present more opportunities for attacks on users than the
>> character confusion that gets all the attention, so it might not
>> be a wildly good idea to accept them.
>> -- Whether or not mapping occurs is not an incompatibility
>> between IDNA2003 and IDNA2008. IDNA2003 requires a particular
>> mapping; IDNA2008 permits, but does not require, mapping. If
>> you decide to map, there are at least two sets of
>> recommendations as to how to do so: the ones represented in UTR
>> 46 and the ones represented in RFC 3895. If you choose one or
>> the other, I think you had best be prepared to defend the choice
>> during Last Call. IMO, "MAY implement either UTF 46 or RFC
>> 3895" would be much more appropriate than ignoring one or the
> I agree that "MAY impelement either UTS 46 or RFC 5895" is more appropriate.
Would one of you be willing to propose a specific wording for that paragraph?
>> -- There are issues involving characters that IDNA2003 mapped to
>> nothing, particularly ZWJ and ZWNJ. You can't have it both
>> ways: if you choose the "map to nothing" option, names that are
>> legitimate for registration under IDNA2008 (and that involve
>> important distinctions in some languages) become inaccessible
>> and you risk some false positives with all of the advantages
>> that provides for attackers. My own recommendation would be
>> to adopt the IDNA2008 handling for those characters as soon as
>> feasible regardless of whether one implements the rest of
>> IDNA2008 or not, regardless of whether one applies some
>> particular set of mappings or not, etc.
>> -- There are two other problem characters that are interpreted
>> differently in U-label to A-label conversion in IDNA2003 and
>> IDNA2008. If you are going to take a position on how they
>> should be handled, IMO you really should discuss the issues (or
>> point to something that does) not do some handwaving about UTR
>> 46 mappings.
> I think part of the question here is: what will be source and format of
> the inputs to the canonicalization algorithm described in this I-D? And
> how will the outputs be used?
> Peter Saint-Andre
More information about the Idna-update