referencing IDNA2008 (and IDNA2003?)

Thu Oct 21 21:25:29 CEST 2010

On Thu, Oct 21, 2010 at 12:02 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
> On 10/21/10 11:56 AM, John C Klensin wrote:
>> First, Alexey and Peter: as you are aware, this is at least the
>> third time lately that the question of "how to reference
>> IDNA2008 in context from a document that originally referenced
>> IDNA2003" has come up.   We probably need to figure out how to
>> establish an Area-wide (or IETF-wide) strategy, rather than
>> going through it one WG or document at a time.
>
> Agreed.
>
>> Comments on Jeff's note/proposal inline below...
>>
>> --On Thursday, October 21, 2010 10:12 -0700 "=JeffH"
>> <Jeff.Hodges at KingsMountain.com> wrote:
>>
>>> Hi,
>>>
>>> In the httpstate, we've almost completed our spec on HTTP
>>> Cookies (as they actually are implemented & deployed). In the
>>> process, we've attempted to properly reference the IDNA specs,
>>> but of course with the recent publication of IDNA2008 and its
>>> obsoleting IDNA2003, this presented a bit of head-scratching
>>> WRT how to properly reference them since IDNA2008 is not
>>> backwards compatible and there's going to be a transition
>>> period for some time.
>>>
>>> Below's what we came up with (the following are relevant
>>> excerpts from
>>> <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>).
>>>
>>> How does that look to you folks?
>>>
>>> thanks,
>>>
>>> =JeffH
>>> [ httpstate chair & document shepherd ]
>>>
>>>
>>> ...
>>>
>>> 5.1.2.  Canonicalized host names
>>>
>>>     A canonicalized host name is the string generated by the
>>> following
>>>     algorithm:
>>>
>>>     1.  Convert the host name to a sequence of NR-LDH labels
>>> (see Section
>>>         2.3.2.2 of [RFC5890]) and/or A-labels according to the
>>>         appropriate IDNA specification [RFC5891] or [RFC3490]
>>> (see
>>>         Section 6.3 of this specification)
>>
>> I don't have time to look right now, but you probably don't want
>> "NR-LDH labels".  I will try to check in the next day or so if
>> no one else gets to it.
>
> I think this spec refers to NR-LDH labels here because A-labels are for
> use by IDNs, but if the input to this canonicalization algorithm is a
> traditional domain name (containing no IDN labels) then there is no need
> to prefix the string with "xn--" on the way to producing an A-label --
> if you don't need an A-label, an NR-LDH label will do.
>
>> I'll let Andrew Sullivan or something else comment further, but
>> "host name" may or may not mean what you think it does.
>
> It never does.
>
>>>     2.  Convert the labels to lower case.
>>
>> I was initially confused by that.  I think you want "Convert the
>> labels resulting from Step 1" or something like that to make it
>> absolutely clear that you are not talking about a lower-case
>> conversion of non-ASCII strings.
>
> Yes, that's better.

Done.

>>>     3.  Concatenate the labels, separating each label from the
>>> next with
>>>         a %x2E (".") character.
>>> ...
>>
>>> 6.3.  IDNA dependency and migration
>>>
>>>     IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
>>>     backwards-compatible.  For this reason, there will be a
>>> transition
>>>     period (possibly of a number of years).  User agents
>>> SHOULD implement
>>>     IDNA2008 [RFC5890] and MAY implement [Unicode Technical
>>> Standard #46
>>>     <http://unicode.org/reports/tr46/>] in order to facilitate
>>> a smoother
>>>     IDNA transition.  If a user agent does not implement
>>> IDNA2008, the
>>>     user agent MUST implement IDNA2003 [RFC3490].
>>
>> That paragraph has the odor of FUD.  Please understand that, at
>> a 10000 meter level, the number of practical incompatibilities
>> between IDNA2003 and IDNA2008 are very few.  In particular:
>>
>> -- Strings containing symbols, punctuation, etc., are generally
>> invalid under IDNA2008 and were generally valid under IDNA2003.
>> You might plausibly want to recommend that an implementation
>> ignore IDNA2008 and look strings containing them up anyway (I
>> might oppose that, but it doesn't make it less plausible) but
>> various mapping strategies have nothing to do with that.  FWIW,
>> IDNA2008 actually permits those lookups if the application
>> receives an A-label.  Of course, a subset of such strings may
>> present more opportunities for attacks on users than the
>> character confusion that gets all the attention, so it might not
>> be a wildly good idea to accept them.
>>
>> -- Whether or not mapping occurs is not an incompatibility
>> between IDNA2003 and IDNA2008.  IDNA2003 requires a particular
>> mapping; IDNA2008 permits, but does not require, mapping.   If
>> you decide to map, there are at least two sets of
>> recommendations as to how to do so: the ones represented in UTR
>> 46 and the ones represented in RFC 3895.   If you choose one or
>> the other, I think you had best be prepared to defend the choice
>> during Last Call.  IMO, "MAY implement either UTF 46 or RFC
>> 3895" would be much more appropriate than ignoring one or the
>> other.
>
> s/UTF/UTS/
>
> s/3895/5895/
>
> I agree that "MAY impelement either UTS 46 or RFC 5895" is more appropriate.

Would one of you be willing to propose a specific wording for that paragraph?

Thanks,
Adam

>> -- There are issues involving characters that IDNA2003 mapped to
>> nothing, particularly ZWJ and ZWNJ.  You can't have it both
>> ways: if you choose the "map to nothing" option, names that are
>> legitimate for registration under IDNA2008 (and that involve
>> important distinctions in some languages) become inaccessible
>> and you risk some false positives with all of the advantages
>> that provides for attackers.    My own recommendation would be
>> to adopt the IDNA2008 handling for those characters as soon as
>> feasible regardless of whether one implements the rest of
>> IDNA2008 or not, regardless of whether one applies some
>> particular set of mappings or not, etc.
>>
>> -- There are two other problem characters that are interpreted
>> differently in U-label to A-label conversion in IDNA2003 and
>> IDNA2008.  If you are going to take a position on how they
>> should be handled, IMO you really should discuss the issues (or
>> point to something that does) not do some handwaving about UTR
>> 46 mappings.
>
> I think part of the question here is: what will be source and format of
> the inputs to the canonicalization algorithm described in this I-D? And
> how will the outputs be used?
>
> Peter
>
> --
> Peter Saint-Andre
> https://stpeter.im/
>
>
>
>