referencing IDNA2008 (and IDNA2003?)

Thu Oct 21 22:08:04 CEST 2010

On 10/21/10 1:25 PM, Adam Barth wrote:
> On Thu, Oct 21, 2010 at 12:02 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
>> On 10/21/10 11:56 AM, John C Klensin wrote:

<snip/>

>>>> 6.3.  IDNA dependency and migration
>>>>
>>>>     IDNA2008 [RFC5890] supersedes IDNA2003 [RFC3490] but is not
>>>>     backwards-compatible.  For this reason, there will be a
>>>> transition
>>>>     period (possibly of a number of years).  User agents
>>>> SHOULD implement
>>>>     IDNA2008 [RFC5890] and MAY implement [Unicode Technical
>>>> Standard #46
>>>>     <http://unicode.org/reports/tr46/>] in order to facilitate
>>>> a smoother
>>>>     IDNA transition.  If a user agent does not implement
>>>> IDNA2008, the
>>>>     user agent MUST implement IDNA2003 [RFC3490].
>>>
>>> That paragraph has the odor of FUD.  Please understand that, at
>>> a 10000 meter level, the number of practical incompatibilities
>>> between IDNA2003 and IDNA2008 are very few.  In particular:
>>>
>>> -- Strings containing symbols, punctuation, etc., are generally
>>> invalid under IDNA2008 and were generally valid under IDNA2003.
>>> You might plausibly want to recommend that an implementation
>>> ignore IDNA2008 and look strings containing them up anyway (I
>>> might oppose that, but it doesn't make it less plausible) but
>>> various mapping strategies have nothing to do with that.  FWIW,
>>> IDNA2008 actually permits those lookups if the application
>>> receives an A-label.  Of course, a subset of such strings may
>>> present more opportunities for attacks on users than the
>>> character confusion that gets all the attention, so it might not
>>> be a wildly good idea to accept them.
>>>
>>> -- Whether or not mapping occurs is not an incompatibility
>>> between IDNA2003 and IDNA2008.  IDNA2003 requires a particular
>>> mapping; IDNA2008 permits, but does not require, mapping.   If
>>> you decide to map, there are at least two sets of
>>> recommendations as to how to do so: the ones represented in UTR
>>> 46 and the ones represented in RFC 3895.   If you choose one or
>>> the other, I think you had best be prepared to defend the choice
>>> during Last Call.  IMO, "MAY implement either UTF 46 or RFC
>>> 3895" would be much more appropriate than ignoring one or the
>>> other.
>>
>> s/UTF/UTS/
>>
>> s/3895/5895/
>>
>> I agree that "MAY impelement either UTS 46 or RFC 5895" is more appropriate.
> 
> Would one of you be willing to propose a specific wording for that paragraph?

I'm willing. Whether I'm competent is another question. :)

>>> -- There are issues involving characters that IDNA2003 mapped to
>>> nothing, particularly ZWJ and ZWNJ.  You can't have it both
>>> ways: if you choose the "map to nothing" option, names that are
>>> legitimate for registration under IDNA2008 (and that involve
>>> important distinctions in some languages) become inaccessible
>>> and you risk some false positives with all of the advantages
>>> that provides for attackers.    My own recommendation would be
>>> to adopt the IDNA2008 handling for those characters as soon as
>>> feasible regardless of whether one implements the rest of
>>> IDNA2008 or not, regardless of whether one applies some
>>> particular set of mappings or not, etc.
>>>
>>> -- There are two other problem characters that are interpreted
>>> differently in U-label to A-label conversion in IDNA2003 and
>>> IDNA2008.  If you are going to take a position on how they
>>> should be handled, IMO you really should discuss the issues (or
>>> point to something that does) not do some handwaving about UTR
>>> 46 mappings.
>>
>> I think part of the question here is: what will be source and format of
>> the inputs to the canonicalization algorithm described in this I-D? And
>> how will the outputs be used?

Adam, could you perhaps shed some light on that topic? I *think* the
input here is the name of the HTTP server to which the user agent sent
its request and from which it receives a response containing a
Set-Cookie header field. The name could be an IPv4 or IPv6 address, a
mere machine name (e.g., on a local network), a traditional domain name
(containing only ASCII characters), or an internationalized domain name
(IDN). For IDNs, the user agent (or the DNS library it uses) might
support either IDNA2003 or IDNA2008, so the actual input to the c14n
algorithm might be either an A-label or a U-label. However, the output
of the algorithm is used only for internal comparison within the user
agent, not communication to another entity, so as far as I can see
(which, I admit, might not be very far!) the user agent only needs to
process the inputs consistently -- it doesn't matter all that much
whether it uses IDNA2003 (including the Nameprep profile of stringprep)
to do that or whether it uses IDNA2008 (optionally including UTS-46 or
RFC 5395 to map characters).

However, I am not an i18n expert so I shall pause here for further
feedback before attempting to formulate any proposed text.

Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6105 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20101021/2af6e872/attachment.bin>