Havoc

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Fri Aug 29 08:01:07 CEST 2008


Martin Duerst wrote:

> trying to reformat a "flowed" mail

Oops, sorry, checking what is going on.  My MUA picked
B64 UTF-8 in reply to Mark, but not flowed on my side:
<http://article.gmane.org/gmane.ietf.idnabis/2579/raw>

Maybe something on your side decided that B64 is a bad
idea and tried to improve it adding a "flowed" effect.
The wonders of i18n in e-mail...

>> It is a VERY DANGEROUS BUG in RFC 3987,
 
> Sorry, but the "bug", if it were a bug, is in RFC 3986,
> NOT in RFC 3987. I seem to remember to having told you
> that several times already, but maybe you forgot.

If we agree that it is dangerous and highly undesirable
I'm not picky if it is a bug or feature in 3987 or 3986.

It needs a very clear MUST NOT in IDNAbis, if that goes
with an "updates 3986" or similar, so be it.  I think
you can fix it directly in 3987bis. 

> http://B%C3%BCcher.de is a totally legal *U*RI according
> to RFC 3986 (STD 66!)

Yes, and it could be <reg-name> B%C3%BCcher.x.y.z.example
in a registry allowing such names.  That is not what Mark
meant when he wrote "works".  He expects to arrive at
the <reg-name> xn--bcher-kva.x.y.z.example 

| When a non-ASCII registered name represents an 
| internationalized domain name intended for resolution
| via the DNS, the name must be transformed to the IDNA
| encoding [RFC3490] prior to name lookup.  URI
| producers should provide these registered names in the
| IDNA encoding, rather than a percent-encoding, if they
| wish to maximize interoperability with legacy URI 
| resolvers.

That's the 3986 part of the bug.  It has the MUST right,
but the SHOULD is utter dubious.  There is *no* excuse
to violate the SHOULD, and this is not limited backwards
compatibility.  DNS supports label B%C3%BCcher "as is".

It is a perfectly valid DNS label like any other string
consisting of 1 to 63 octets.  Applications unaware of
the RFC 3986 fine print look up B%C3%BCcher and end up
nowhere if they are very lucky.  Or at a malicious site.

Or the percent-encoded literal is longer than 63 and
throws an unexpected error in a critical piece of code.

If RFC 3986 expected a "worldwide GetHostByName update"
for IDNA I'd be very curious what the IESG was smoking
when they approved it.

> Yes, that label of course could exist. But it would
> have to be encoded as "B%25C3%25BCcher" in an URI/IRI.

> [Back to the basics of escaping: Don't forget to escape
>  the escape character!]

Yes, we could arrive at a literal B%C3%BCcher even after
the worldwide upgrade.  But that won't happen, software
takes names "as is", and asks a DNS resolver API for its
opinion.  And that API will not transform percent-encode
to punycode.  This is not the layer where IDNA happens,
we're far too low.  RFC 3987 is the layer for IDNA magic:

"URI producers SHOULD":  In a certain sense RFC 3987 is
an URI producer.  It has no good excuse to violate this
most critical SHOULD in STD 63.  

RFC 3987 is the key stone of the complete IDNA business,
it can't be sloppy hoping that lower layers will fix it.

They won't.  Proof:  I put B%C3%BCcherde in FF2 and got
an error.  I put B%C3%BCcher .de in rxgeturl (that's a
REXX script with a sockets API) and get an error.  I put
B%C3%BCcher.de in a modern dig.exe and get an error.
I put B%C3%BCcher.de in a W2K nslookup.exe and get an
error.  I put B%C3%BCcher.de in IE6 and get an error.

They all happily tried a literal "do what I mean", and
DENIC.de (or rather the DNS cache at my ISP) told them
"no such host".  See above - this is a *lucky* outcome.

Nobody knows what happens at B%C3%BCcher.x.y.z.example.

 Frank

P.S.:  There was a similar thread on the EAI list while
you were away.  Charles had some different ideas about 
this issue, he wrote IRIs are not yet defined for HHTP.



More information about the Idna-update mailing list