Mapping Stability/Storage (was Re: M-Label or MVALID, and dangers with mappings?)

Sun Apr 12 09:34:55 CEST 2009

--On Saturday, April 11, 2009 18:13 -0700 Mark Davis
<mark at macchiato.com> wrote:

> Sure.
> (1) A browser captures a URL from an address bar, and sends it
> to a backend program, as is. The second process examines it
> for spoofing issues, and communicates back to the first.
> Remapping loses information that might be relevant to the user.

Yes.   But please remember that, if I push "FoOBaR" through an
application and into the DNS, I may get back "foobar", "FOOBAR",
the original, or another variation.   The standard is at least
moderately clear that I should get back what is stored, but
conforming to that requirement would yield a different answer
than what went in and the requirement itself is widely ignored.
I think this is a variation on the theme of "don't expect to be
able to write novels in the DNS".

> (2) I include http://ÖBB.at in this email message. My emailer
> is IDNA aware, and recognizes this as a URL. I send the email
> to yours, which is, I presume, also IDNA aware. I don't want
> either one to lowercase it.

Well, first of all, please remember that, if your software
recognizes that as a URL, it is non-conformant to the URL spec
(see below).  It is a perfectly good IRI, but there are
restrictions on where an IRI can be used.  Second, if it is in
the body of the email message and "recognized" as a URL,
something is doing something heuristic, partially because, with
running text in an email body, there is no way to know for sure,
at least absent natural language parsing/ analysis, whether it
is actually a URL or an example of something that looks like a
URL but that is not intended for anyone to try to use in a
protocol.

Put differently, the only protocol-relevant way that something
can "recognize" a string as a URL is if it appears in a URL
context or is otherwise marked as such.   So 
    <a href="http://ÖBB.at/">some text</a>
if seen in a message identified with 
   content-type=text/html; charset="utf-8"
is obviously an attempt at a URL, even though it is an invalid
one.
   <a href="http://xn--bb-eka.at/">other text</a>
would be both a URL and valid.   An MUA that changed either of
the first two forms into the third would be bringing them into
conformance with the standard, which is usually considered a
desirable public service.

By contrast, "I include http://ÖBB.at in..." is part of a
sentence and not a URL context.   I would expect software to not
mess with sentences and would treat altering "ÖBB.at" into
either "öbb.at" or "xn--bb-eka.at" as obnoxious behavior, fully
as obnoxious as changing "I include" in that sentence into "i
InCluDe".  It is perhaps also worth noting that something that
changes
  <a href="http://xn--bb-eka.at/">http://ÖBB.at/</a>
into 
  <a href="http://xn--bb-eka.at/">http://öbb.at/</a>
is guilty of the same obnoxious behavior.

> ...
> The question to back to you is what concrete problems are
> solved by making this a MUST.

Two responses, with the first being the more important one:

(1) The problem with non-conformant behavior that goes beyond
the standard is that you never know what an implementation you
haven't seen before is going to do with it.  No matter how many
implementations you find that do what you are expecting, i.e.,
treating "http://ÖBB.at" as if it were a valid URL and sorting
out the details at the last moment, there will be one out there
that doesn't, leading to inconsistent and unpredicted behavior.
To take an example about this particular case, I've still got an
older copy of Lynx running on one of my systems and, while it is
just fine with "http://xn--bb-eka.at/", it is pretty sure that
anything with non-ASCII UTF-8 in the domain part isn't a URL.

You can, of course, make the argument that maybe the
implementation won't be conformant either (plus or minus this
local extension), but that is a qualitatively different
assertion, IMO.

(2)  As I have said before, the exercise of "prove that it will;
no, you prove that it won't" doesn't get us anywhere without
some agreed principles about who has to prove what.  I suggest
that clarity in a standard argues for having as few exceptions
as possible.  We say "SHOULD" when there are likely to be
exception cases.  IMO, those who say "SHOULD" in the sort of
context represented by the IDNA documents should (sic) be
prepared to identify some examples of where the exceptions might
be found.  I believe that it is an illustration of that "no
exceptions" clarity argument that Section 3.2 of TUS contains
"shall" or "shall not" on almost every conformance clause.  The
exceptions are one "must" and, in C19, a "may" that is
immediately followed by "must disclose".

regards,
     john