Updating RFC 5890-5893 (IDNA 2008) to Full Standard

Mon Nov 12 13:49:03 CET 2012

On Thu, Nov 8, 2012 at 4:26 PM, John C Klensin <klensin at jck.com> wrote:
> Folks, this is not a discussion I can have in a thoughtful way
> this week so, while there are a few comments below, I won't be
> able respond to this thread any more until late next week.

No worries.

> --On Thursday, 08 November, 2012 10:05 +0100 Anne van Kesteren
> <annevk at annevk.nl> wrote:
>> I'm not sure what Internet Explorer does, but of the other
>> browsers only Opera implements IDNA2008 (and does not do it
>> per the recommendations of UTS #46, and is probably
>> incompatible with deployed content and needs to change).
>
> See the response I just sent to Martin about UTR 46 and note
> that is _not_ part of IDNA2008.  What may be more relevant than
> what the browsers are or are not doing is that several
> registries are eliminating labels that are not
> IDNA2008-conformant either at renewal time or earlier.  So, if
> "content" contains non-conformant strings, there is a rising
> danger of non-resolution regardless of what browsers do.

It seems bad to kill resources such as http://xn--ls8h.la/ (U+1F4A9)
via such a process. I know the IETF publication process does not
subscribe to http://www.w3.org/Provider/Style/URI.html (although
tools.ietf.org fortunately does), but actively encouraging the
deletion of domain names seems harmful. Registries however are of no
help when it comes to resources such as http://xn--74h.damowmow.com/
(U+263A).

> But I don't understand how you tell by testing that a browser
> has implemented IDNA2008.  Even in the notorious Eszett case,
> lower-case Eszett was an undefined code point in IDNA2003.

By that I guess you mean that you did not anticipate implementations
to update their Unicode implementations after IDNA2003 was released?

> Now, if browsers implemented a made-up IDNA2005 (i.e., IDNA2003
> with a version of Stringprep that they guessed at without any
> standardization process or with a Unicode 5.x version of case
> mapping) then there is an incompatibility with IDNA2008.  But
> that isn't non-compliance with IDNA2003.

It does mean that running code and its documentation are way out of
sync. It seems extremely weird to expect implementations to keep their
version of Unicode frozen for the purposes of IDNA2003. (I thought
that not being able to freeze Unicode (and really any other externally
referenced modular concept) was common knowledge...) It also seems
weird that when IDNA2008 was created running code was not taken into
account.

> On the other hand, if the user types Eszett in upper case,
> IDNA2003 maps it to "ss".  IDNA2008 doesn't prohibit that... it
> just says that a label containing it isn't a U-label and that
> what the lookup application does is up to it.  It strongly
> implies that a warning message would be appropriate, but doesn't
> require even that.

I'm not too concerned with user input. We cannot standardize UI. What
I care about is that IDNA2008 changed the meaning of e.g.

  <a href="http://faß.de">test</a>

in HTML. Or at best IDNA2008 made what that means undefined, since I
could define "local needs" for HTML I suppose and make it work like it
did before. Which is just crazy. Copy and pasting URLs around should
not change their meaning.

Another example that IDNA2008 made undefined: domain label separators.
All browsers support the four listed by IDNA2003 (see
http://mathias.html5.org/tests/url/idna2003-separators/ for a test)
and there sure is content relying on it out there.

-- 
http://annevankesteren.nl/