Havoc (was: Issues lists and the "preprocessing" topic)

Martin Duerst duerst at it.aoyama.ac.jp
Thu Aug 28 07:33:42 CEST 2008


[trying to reformat a "flowed" mail, sorry if I got it wrong]

Hello Frank, others,

At 13:39 08/08/28, Frank Ellermann wrote:
>Mark Davis wrote:
>
>> http://docs.google.com/Doc?id=dfqr8rd5_51c3nrskcx
>
>(4) "works" is misleading, it is wrong if it "works".
>
>It is a VERY DANGEROUS BUG in RFC 3987,

Sorry, but the "bug", if it were a bug, is in RFC 3986,
NOT in RFC 3987. I seem to remember to having told you
that several times already, but maybe you forgot.

http://B%C3%BCcher.de is a totally legal *U*RI according
to RFC 3986 (STD 66!), and per that STD maps to
http://xn--bcher-kva.de. The paragraph just
before Section 3.2.3 (last paragraph on p. 21) says this:

   The reg-name syntax allows percent-encoded octets in order to
   represent non-ASCII registered names in a uniform way that is
   independent of the underlying name resolution technology.  Non-ASCII
   characters must first be encoded according to UTF-8 [STD63], and then
   each octet of the corresponding UTF-8 sequence must be percent-
   encoded to be represented as URI characters.  URI producing
   applications must not use percent-encoding in host unless it is used
   to represent a UTF-8 character sequence.  When a non-ASCII registered
   name represents an internationalized domain name intended for
   resolution via the DNS, the name must be transformed to the IDNA
   encoding [RFC3490] prior to name lookup.  URI producers should
   provide these registered names in the IDNA encoding, rather than a
   percent-encoding, if they wish to maximize interoperability with
   legacy URI resolvers.


>incompatible with the spirit of STD 63 and DNS.

Sorry, but I don't see how the spirit of STD 63 or DNS
would have any say on percent-escaping in URIs. Can you
explain?


>The label "B%C3%BCcher" could exist, not at the
>level in your example, because DENIC won't allow
>it, but at other levels or in other zones.

Yes, that label of course could exist. But it would have
to be encoded as "B%25C3%25BCcher" in an URI/IRI.

[Back to the basics of escaping: Don't forget to escape the
escape character!]

Also, it's not fully clear to me whether it would indeed be
resolvable (some applications may use toASCII, which I think
would choke on it, others might just send it off after
unescaping because they wouldn't detect any non-ASCII,...),
but in this discussion, that's a detail.


>And it is totally unrelated to "xn--bcher-kva", 
>unless the owner of "B%C3%BCcher" 
>arranges it as alias for "xn--bcher-kva".

So much is correct.


>IMO the IE7 behaviour "doesn't work" is perfect,
>the FF3 / safari behaviour is completely broken.

The FF3 and safari behavior are correct, the IE7
behavior is outdated.


Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list