referencing IDNA2008 (and IDNA2003?)
Peter Saint-Andre
stpeter at stpeter.im
Thu Oct 21 21:02:26 CEST 2010
On 10/21/10 11:56 AM, John C Klensin wrote:
> First, Alexey and Peter: as you are aware, this is at least the
> third time lately that the question of "how to reference
> IDNA2008 in context from a document that originally referenced
> IDNA2003" has come up. We probably need to figure out how to
> establish an Area-wide (or IETF-wide) strategy, rather than
> going through it one WG or document at a time.
Agreed.
> Comments on Jeff's note/proposal inline below...
>
> --On Thursday, October 21, 2010 10:12 -0700 "=JeffH"
> <Jeff.Hodges at KingsMountain.com> wrote:
>
>> Hi,
>>
>> In the httpstate, we've almost completed our spec on HTTP
>> Cookies (as they actually are implemented & deployed). In the
>> process, we've attempted to properly reference the IDNA specs,
>> but of course with the recent publication of IDNA2008 and its
>> obsoleting IDNA2003, this presented a bit of head-scratching
>> WRT how to properly reference them since IDNA2008 is not
>> backwards compatible and there's going to be a transition
>> period for some time.
>>
>> Below's what we came up with (the following are relevant
>> excerpts from
>> <http://tools.ietf.org/html/draft-ietf-httpstate-cookie>).
>>
>> How does that look to you folks?
>>
>> thanks,
>>
>> =JeffH
>> [ httpstate chair & document shepherd ]
>>
>>
>> ...
>>
>> 5.1.2. Canonicalized host names
>>
>> A canonicalized host name is the string generated by the
>> following
>> algorithm:
>>
>> 1. Convert the host name to a sequence of NR-LDH labels
>> (see Section
>> 2.3.2.2 of [RFC5890]) and/or A-labels according to the
>> appropriate IDNA specification [RFC5891] or [RFC3490]
>> (see
>> Section 6.3 of this specification)
>
> I don't have time to look right now, but you probably don't want
> "NR-LDH labels". I will try to check in the next day or so if
> no one else gets to it.
I think this spec refers to NR-LDH labels here because A-labels are for
use by IDNs, but if the input to this canonicalization algorithm is a
traditional domain name (containing no IDN labels) then there is no need
to prefix the string with "xn--" on the way to producing an A-label --
if you don't need an A-label, an NR-LDH label will do.
> I'll let Andrew Sullivan or something else comment further, but
> "host name" may or may not mean what you think it does.
It never does.
>> 2. Convert the labels to lower case.
>
> I was initially confused by that. I think you want "Convert the
> labels resulting from Step 1" or something like that to make it
> absolutely clear that you are not talking about a lower-case
> conversion of non-ASCII strings.
Yes, that's better.
>> 3. Concatenate the labels, separating each label from the
>> next with
>> a %x2E (".") character.
>> ...
>
>> 6.3. IDNA dependency and migration
>>
>> IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
>> backwards-compatible. For this reason, there will be a
>> transition
>> period (possibly of a number of years). User agents
>> SHOULD implement
>> IDNA2008 [RFC5890] and MAY implement [Unicode Technical
>> Standard #46
>> <http://unicode.org/reports/tr46/>] in order to facilitate
>> a smoother
>> IDNA transition. If a user agent does not implement
>> IDNA2008, the
>> user agent MUST implement IDNA2003 [RFC3490].
>
> That paragraph has the odor of FUD. Please understand that, at
> a 10000 meter level, the number of practical incompatibilities
> between IDNA2003 and IDNA2008 are very few. In particular:
>
> -- Strings containing symbols, punctuation, etc., are generally
> invalid under IDNA2008 and were generally valid under IDNA2003.
> You might plausibly want to recommend that an implementation
> ignore IDNA2008 and look strings containing them up anyway (I
> might oppose that, but it doesn't make it less plausible) but
> various mapping strategies have nothing to do with that. FWIW,
> IDNA2008 actually permits those lookups if the application
> receives an A-label. Of course, a subset of such strings may
> present more opportunities for attacks on users than the
> character confusion that gets all the attention, so it might not
> be a wildly good idea to accept them.
>
> -- Whether or not mapping occurs is not an incompatibility
> between IDNA2003 and IDNA2008. IDNA2003 requires a particular
> mapping; IDNA2008 permits, but does not require, mapping. If
> you decide to map, there are at least two sets of
> recommendations as to how to do so: the ones represented in UTR
> 46 and the ones represented in RFC 3895. If you choose one or
> the other, I think you had best be prepared to defend the choice
> during Last Call. IMO, "MAY implement either UTF 46 or RFC
> 3895" would be much more appropriate than ignoring one or the
> other.
s/UTF/UTS/
s/3895/5895/
I agree that "MAY impelement either UTS 46 or RFC 5895" is more appropriate.
> -- There are issues involving characters that IDNA2003 mapped to
> nothing, particularly ZWJ and ZWNJ. You can't have it both
> ways: if you choose the "map to nothing" option, names that are
> legitimate for registration under IDNA2008 (and that involve
> important distinctions in some languages) become inaccessible
> and you risk some false positives with all of the advantages
> that provides for attackers. My own recommendation would be
> to adopt the IDNA2008 handling for those characters as soon as
> feasible regardless of whether one implements the rest of
> IDNA2008 or not, regardless of whether one applies some
> particular set of mappings or not, etc.
>
> -- There are two other problem characters that are interpreted
> differently in U-label to A-label conversion in IDNA2003 and
> IDNA2008. If you are going to take a position on how they
> should be handled, IMO you really should discuss the issues (or
> point to something that does) not do some handwaving about UTR
> 46 mappings.
I think part of the question here is: what will be source and format of
the inputs to the canonicalization algorithm described in this I-D? And
how will the outputs be used?
Peter
--
Peter Saint-Andre
https://stpeter.im/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6105 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20101021/37b62d6c/attachment-0001.bin>
More information about the Idna-update
mailing list