Havoc (was: Issues lists and the "preprocessing" topic)
Martin Duerst
duerst at it.aoyama.ac.jp
Thu Aug 28 07:33:42 CEST 2008
[trying to reformat a "flowed" mail, sorry if I got it wrong]
Hello Frank, others,
At 13:39 08/08/28, Frank Ellermann wrote:
>Mark Davis wrote:
>
>> http://docs.google.com/Doc?id=dfqr8rd5_51c3nrskcx
>
>(4) "works" is misleading, it is wrong if it "works".
>
>It is a VERY DANGEROUS BUG in RFC 3987,
Sorry, but the "bug", if it were a bug, is in RFC 3986,
NOT in RFC 3987. I seem to remember to having told you
that several times already, but maybe you forgot.
http://B%C3%BCcher.de is a totally legal *U*RI according
to RFC 3986 (STD 66!), and per that STD maps to
http://xn--bcher-kva.de. The paragraph just
before Section 3.2.3 (last paragraph on p. 21) says this:
The reg-name syntax allows percent-encoded octets in order to
represent non-ASCII registered names in a uniform way that is
independent of the underlying name resolution technology. Non-ASCII
characters must first be encoded according to UTF-8 [STD63], and then
each octet of the corresponding UTF-8 sequence must be percent-
encoded to be represented as URI characters. URI producing
applications must not use percent-encoding in host unless it is used
to represent a UTF-8 character sequence. When a non-ASCII registered
name represents an internationalized domain name intended for
resolution via the DNS, the name must be transformed to the IDNA
encoding [RFC3490] prior to name lookup. URI producers should
provide these registered names in the IDNA encoding, rather than a
percent-encoding, if they wish to maximize interoperability with
legacy URI resolvers.
>incompatible with the spirit of STD 63 and DNS.
Sorry, but I don't see how the spirit of STD 63 or DNS
would have any say on percent-escaping in URIs. Can you
explain?
>The label "B%C3%BCcher" could exist, not at the
>level in your example, because DENIC won't allow
>it, but at other levels or in other zones.
Yes, that label of course could exist. But it would have
to be encoded as "B%25C3%25BCcher" in an URI/IRI.
[Back to the basics of escaping: Don't forget to escape the
escape character!]
Also, it's not fully clear to me whether it would indeed be
resolvable (some applications may use toASCII, which I think
would choke on it, others might just send it off after
unescaping because they wouldn't detect any non-ASCII,...),
but in this discussion, that's a detail.
>And it is totally unrelated to "xn--bcher-kva",
>unless the owner of "B%C3%BCcher"
>arranges it as alias for "xn--bcher-kva".
So much is correct.
>IMO the IE7 behaviour "doesn't work" is perfect,
>the FF3 / safari behaviour is completely broken.
The FF3 and safari behavior are correct, the IE7
behavior is outdated.
Regards, Martin.
#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update
mailing list