Stop me if I've misunderstood...

Thu Jul 9 17:07:27 CEST 2009

--On Wednesday, July 08, 2009 21:03 +0100 Gervase Markham
<gerv at mozilla.org> wrote:

> I must confess that I've not had time recently to follow
> carefully the  discussions about mappings. If that means that
> some people consign this  message to the bit bucket, so be it.
> 
> At the moment, standard domain names have what I'll call
> "bus-ability" -  that is, if you see them in an advert on the
> side of a bus, you can  write them down, type them into any
> web browser or other domain  name-using client later, and
> you'll end up at the place intended by the  creator of the
> advertisement. IDN domain names under the current version  of
> IDN have, as far as I understand it, pretty much the same 
> "bus-ability" property. In the IDN case, what the user types
> has to be  first normalized, and then converted to punycode.
> The user in no way  needs to know or care about this extra
> technical complexity. It just works.

Correct, although, in principle, someone can put characters on
the side of that bus that will look very confusing and will
render correctly into a domain name only if the user figures out
how to enter them correctly.  Since some of those characters are
unlikely to appear on keyboards, the odds of users getting that
right are not high.

Consequently, those who put domain names (even IDNs) on buses,
are well-advised to select characters that require a minimum of
mapping and other transformations.  

The NFC-type normalization part of the conversion process is
used to protect, not against those "compatibility" characters
but against the fact that different keyboard arrangements may
cause the same character to be represented in different ways,
e.g., as either a single precomposed character or as a base
character plus a combining character.

> I would assert that this property is pretty key to keeping the
> web  working in a sane and, importantly, secure manner. People
> convert domain  names from print/voice/memory to computer and
> back all the time.

I don't think IDNA2008, with or without the most recent
proposals, changes that property.  The main thing IDNA2008 does
that is different from IDNA2003 is to strongly discourage any
string that requires mapping from those adverts.   In other
words, if one uses the target strings -- what we are now calling
U-labels (or, while I would not suggest them in ads, A-labels)
-- on the sides of the bus, there will be no ambiguity about
names, no risks of names being interpreted differently between
IDNA2008 and IDNA2003, and so on, which is exactly the goal I
think you are asking for.

The thinking of the design group to which Andrew refers was
strongly influenced by your own comments to the effect that, if
there was ever a choice between your perceptions about what was
necessary to protect users and whatever the standard said, you
were going to choose the former every time.   The "apply as
little mapping as possible" and "use only final (canonical)
strings in URLs" strategies both follow from that.

Shawn's discussion of the characteristics of "as little as
possible" seem right to me.  Certainly I would not recommend
implementing the portion of a browser into which users type
characters they think they see on buses without case mapping
(especially in places where everyone would assume that different
cases will match) or width mapping (especially in places where
everyone would assume that characters of different widths would
compare equal), but the IDNA2008 design has allowed for those
transformations in the UI and prior to IDNA2008-specific
processing since the very beginning... and no one is proposing
to change that now.

> If the standards were to change in such a way that it becomes
> quite  legal and conforming that typing a set of characters
> into browser A  takes you to website Q, but typing the same
> set into browser B takes you  to website R, I would politely
> suggest that those who wrote the new  standards had taken
> leave of their senses. This is a recipe for chaos.  And
> phishing.

The recipe for chaos lies in having multiple different URIs (and
apparent domain names) that don't compare equal on a string
basis but that do map to the same domain name, especially when
some of them display in an apparently-reasonable way locally
and, due to font availability or other differences, others do
not.  That chaos is enhanced (if you like chaos) by rules in
HTML and elsewhere that specify that URI matching (especially
for URIs that are not intended to be resolved) depends on exact
string comparison.

As far as phishing is concerned, IMO just about the worst thing
we can do to ourselves and our users --even worse than "do you
want to accept this certificate" popups-- would be to get users
used to the notion than domain names are the same even if they
look different.

But it is worth noting that, if and as URIs-as-used evolve to
match both the standard specification and the "final, canonical,
name" requirement of IDNA2008, neither your concerns nor Mark's
will be relevant: they are of concern only in situations in
which one or more non-final names are expected to match final
ones, especially in ways that are not obvious.

> This incredible outcome is not a serious possibility, is it?

No, it is not.  At least as I understand things.

   john