referencing IDNA2008 (and IDNA2003?)

Thu Oct 21 21:02:26 CEST 2010

On 10/21/10 11:56 AM, John C Klensin wrote:
> First, Alexey and Peter: as you are aware, this is at least the
> third time lately that the question of "how to reference
> IDNA2008 in context from a document that originally referenced
> IDNA2003" has come up.   We probably need to figure out how to
> establish an Area-wide (or IETF-wide) strategy, rather than
> going through it one WG or document at a time.

Agreed.

> Comments on Jeff's note/proposal inline below...
> 
> --On Thursday, October 21, 2010 10:12 -0700 "=JeffH"
> <Jeff.Hodges at KingsMountain.com> wrote:
> 
>> Hi,
>>
>> In the httpstate, we've almost completed our spec on HTTP
>> Cookies (as they actually are implemented & deployed). In the
>> process, we've attempted to properly reference the IDNA specs,
>> but of course with the recent publication of IDNA2008 and its
>> obsoleting IDNA2003, this presented a bit of head-scratching
>> WRT how to properly reference them since IDNA2008 is not
>> backwards compatible and there's going to be a transition
>> period for some time.
>>
>> Below's what we came up with (the following are relevant
>> excerpts from
>> <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>).
>>
>> How does that look to you folks?
>>
>> thanks,
>>
>> =JeffH
>> [ httpstate chair & document shepherd ]
>>
>>
>> ...
>>
>> 5.1.2.  Canonicalized host names
>>
>>     A canonicalized host name is the string generated by the
>> following
>>     algorithm:
>>
>>     1.  Convert the host name to a sequence of NR-LDH labels
>> (see Section
>>         2.3.2.2 of [RFC5890]) and/or A-labels according to the
>>         appropriate IDNA specification [RFC5891] or [RFC3490]
>> (see
>>         Section 6.3 of this specification)
> 
> I don't have time to look right now, but you probably don't want
> "NR-LDH labels".  I will try to check in the next day or so if
> no one else gets to it.

I think this spec refers to NR-LDH labels here because A-labels are for
use by IDNs, but if the input to this canonicalization algorithm is a
traditional domain name (containing no IDN labels) then there is no need
to prefix the string with "xn--" on the way to producing an A-label --
if you don't need an A-label, an NR-LDH label will do.

> I'll let Andrew Sullivan or something else comment further, but
> "host name" may or may not mean what you think it does.

It never does.

>>     2.  Convert the labels to lower case.
> 
> I was initially confused by that.  I think you want "Convert the
> labels resulting from Step 1" or something like that to make it
> absolutely clear that you are not talking about a lower-case
> conversion of non-ASCII strings.

Yes, that's better.

>>     3.  Concatenate the labels, separating each label from the
>> next with
>>         a %x2E (".") character.
>> ...
> 
>> 6.3.  IDNA dependency and migration
>>
>>     IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
>>     backwards-compatible.  For this reason, there will be a
>> transition
>>     period (possibly of a number of years).  User agents
>> SHOULD implement
>>     IDNA2008 [RFC5890] and MAY implement [Unicode Technical
>> Standard #46
>>     <http://unicode.org/reports/tr46/>] in order to facilitate
>> a smoother
>>     IDNA transition.  If a user agent does not implement
>> IDNA2008, the
>>     user agent MUST implement IDNA2003 [RFC3490].
> 
> That paragraph has the odor of FUD.  Please understand that, at
> a 10000 meter level, the number of practical incompatibilities
> between IDNA2003 and IDNA2008 are very few.  In particular:
> 
> -- Strings containing symbols, punctuation, etc., are generally
> invalid under IDNA2008 and were generally valid under IDNA2003.
> You might plausibly want to recommend that an implementation
> ignore IDNA2008 and look strings containing them up anyway (I
> might oppose that, but it doesn't make it less plausible) but
> various mapping strategies have nothing to do with that.  FWIW,
> IDNA2008 actually permits those lookups if the application
> receives an A-label.  Of course, a subset of such strings may
> present more opportunities for attacks on users than the
> character confusion that gets all the attention, so it might not
> be a wildly good idea to accept them.
> 
> -- Whether or not mapping occurs is not an incompatibility
> between IDNA2003 and IDNA2008.  IDNA2003 requires a particular
> mapping; IDNA2008 permits, but does not require, mapping.   If
> you decide to map, there are at least two sets of
> recommendations as to how to do so: the ones represented in UTR
> 46 and the ones represented in RFC 3895.   If you choose one or
> the other, I think you had best be prepared to defend the choice
> during Last Call.  IMO, "MAY implement either UTF 46 or RFC
> 3895" would be much more appropriate than ignoring one or the
> other.

s/UTF/UTS/

s/3895/5895/

I agree that "MAY impelement either UTS 46 or RFC 5895" is more appropriate.

> -- There are issues involving characters that IDNA2003 mapped to
> nothing, particularly ZWJ and ZWNJ.  You can't have it both
> ways: if you choose the "map to nothing" option, names that are
> legitimate for registration under IDNA2008 (and that involve
> important distinctions in some languages) become inaccessible
> and you risk some false positives with all of the advantages
> that provides for attackers.    My own recommendation would be
> to adopt the IDNA2008 handling for those characters as soon as
> feasible regardless of whether one implements the rest of
> IDNA2008 or not, regardless of whether one applies some
> particular set of mappings or not, etc.
> 
> -- There are two other problem characters that are interpreted
> differently in U-label to A-label conversion in IDNA2003 and
> IDNA2008.  If you are going to take a position on how they
> should be handled, IMO you really should discuss the issues (or
> point to something that does) not do some handwaving about UTR
> 46 mappings.

I think part of the question here is: what will be source and format of
the inputs to the canonicalization algorithm described in this I-D? And
how will the outputs be used?

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6105 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20101021/37b62d6c/attachment-0001.bin>