referencing IDNA2008 (and IDNA2003?)

Fri Oct 22 22:18:49 CEST 2010

--On Friday, October 22, 2010 13:05 -0700 Adam Barth
<ietf at adambarth.com> wrote:

>> I believe so.  Or rather, I think they'll need to be to work
>>> properly today.  We could run some experiments to be sure,
>>> but I was told that putting non-ASCII characters in HTTP
>>> headers is bad news bears.
>> 
>> That is certainly true for other reasons but, as I am
>> regularly reminded, it doesn't mean that no one is doing it
>> and that no HTTP server is letting them get away with it.  
>> I think the world would be much better served in the long run
>> --in terms of stability and predictable behavior-- if you
>> could take the position that all cookie contents and
>> communication about them took place using either traditional
>> LDH (ASCII) forms or A-labels (note that means, e.g., no
>> ASCII punctuation either). But whether it is plausible to do
>> that at this stage is presumably something the WG will need
>> to consider, including examining whether the "resolution
>> other than via the public DNS" considerations discussed in
>> draft-iab-i18n-encoding are relevant.
> 
> We're not talking about cookie contents.  We're talking about
> the domain-attribute.  The document requires servers to emit
> domain-attribute in the xn-- form or else the user agent will
> ignore their cookies.

Good.

>>> This text seems symmetric.  The user agent needs to know
>>> that it should not apply the U-label => A-label conversion
>>> to the domain-attribute but that it should apply the
>>> conversions to the request-host.
>> 
>> Not sure I understand.  If it is going to make a comparison,
>> both strings have to be in the same label form.  So, if
>> either the domain-attribute or the request-host contain
>> non-ASCII characters, it needs to convert those strings to
>> A-labels (IDNA2008) or via ToASCII (IDNA2003).
> 
> Maybe I misread Peter's text, but his text was symmetric
> w.r.t. X and Y.  However, the behavior is not symmetric w.r.t.
> X and Y, so something needs to break the symmetry.  The
> current document breaks the symmetry by applying various IDNA
> algorithms to Y but not to X.

Ok, that should work.

>> Or, it can take a look at the strings, discover whether there
>> are non-ASCII characters present, and, if they are, simply
>> reject that putative label as bogus.  I have a personal
>> preference (partially based on the "private encoding" theme of
>> draft-iab-i18n-encoding and the small risk of a false
>> positive) but, again, that is a decision the WG should make
>> or leave to the UA.
> 
> No such sniffing is required.

You have to sniff a little bit.  The problem is that, if you
apply "various IDNA algorithms to Y", the effective first step
under IDNA2008 is to determine whether there are non-ASCII
characters present.  If there are not, there is no IDNA
algorithm to apply.  Now you could build a perfectly conforming
IDNA2008 implementation that would behave pretty much like
ToASCII does in that regard, i.e., return the plain-ASCII string
if the input was plain ASCII, return the A-label if the string
was a U-label, and return a failure indication otherwise.   But,
as far as the Standard is concerned, you cannot apply IDNA2008
conversion to an A-label to an all-ASCII string.

>>> That approach doesn't meet requirement (5).  In particular,
>>> this text uses IDNA-folding comparisons instead of first
>>> canonicalizing and then applying octet-by-octet comparisons.
>> 
>> Yep.  And that is exactly the argument for forcing any
>> cookie-related string into A-label form as early as possible
>> and keeping it that way, rather than having IDNs as A-labels,
>> U-labels, and assorted nonsense that is neither floating
>> around. If you know everything is either an A-level, an LDH
>> string, or an error, then life gets a whole lot easier.
> 
> I believe that's what the current document does.

Excellent.  Glad to know we've ended up on the same page.

   john