Q3: What characters should be allowed in a revised IDNA2008 specification?

Sat Apr 4 10:44:09 CEST 2009

Erik,

When I read this, my reaction is something close to "sure, but
so what?".  Let me explain:

We do Internet standards in order to set out a set of rules and
procedures that, if followed, will maximize interoperability,
minimize operational problems, etc.   If someone doesn't
conform, we don't try to call the protocol police... and that is
only partially because they won't answer.  Very few of our
standards go beyond "does not conform" to try, e.g., to specify
what should happen if non-conforming behavior is encountered.
We've even got a design and implementation principle that
encourages implementations that send data to interpret the text
of the standards as narrowly as possible and those who receive
data to be more permissive than the standards require. 

But the standards are violated all the time at all levels of the
stack, sometimes by implementations that are just sloppy,
sometimes by ones that think they can gain some advantage by
doing things in a different way, and sometimes by folks who are
just a little too impressed by their own cleverness.   And,
because good implementations that conform to the standards are
typically permissive until that causes them problems, those
non-conforming things often work, at least for a while and at
least with some other implementations.  That doesn't make the
standards bad, nor does it justify changing them to include
anything that someone might want to try to get away with.

There is an interesting example in the history of IDNs with a
few domains who decided that they weren't interested in waiting
for what became IDNA or much of anything else and just
registered UTF-8 strings.  The DNS cares only about octets, so
they were still conforming with the base DNS protocols.   That
didn't work very well, first because "just register UTF-8
strings", without any supporting structure, puts one in the
ultimately "no mapping" situation (not even NFC was required)
and second because many applications implementations wouldn't
have anything to do with those domain names.  But sometimes they
worked.  They even worked with a few pre-IDNA browsers that, in
the interest of either robustness or sloppy implementations,
simply accepted the UTF-8 strings and passed them off to the DNS.

Then we produced IDNA and browsers started being switched over
to notice non-ASCII domain names and convert them appropriately.
At least for web applications involving those browsers, those
UTF-8 domain names effectively stopped working... and I haven't
noticed us worrying about transitions, either now or in
IDNA2003.   But, even now, I suspect that there are some
applications implementations, and especially an email MTA or
two, that will accept UTF-8 domains and pass them directly into
the DNS for resolution.

In that context, let's look at your example in the light of a
comment Gerv made a few days ago.

> PS This works in Firefox: http://近親相姦☆.sblo.jp/

No surprise.  The characters are valid in IDNA2003, the guidance
against registering a label containing "☆" (that star
character -- I get a box when I try to copy and paste it)
applies to registries and not what we now call lookup
applications.  On the other hand, if and when Gerv and his
colleagues conclude that the character poses a significant
danger to users, they will stop looking it up.  That implies a
couple of things:

	* different behavior in different browsers about that
	type of character.   

	* labels that can be successfully looked up one day, but
	not the next, independent of whatever IDNA says.

If I recall, Gerv's description of that situation was "Ick".  I
agree with him.  I also note that registrants who understand
that situation and want to avoid it will avoid such labels, with
the 
"no symbols" requirement of IDNA2008 acting as a significant
warning, and that banning all symbols in the standard reduces
the long-term opportunities for domain names appearing to go bad.

It seems to me those are significant advantages, even if one
understands that there is no hope of completely banning
undesired behavior.

    john

--On Wednesday, April 01, 2009 10:21 -0700 Erik van der Poel
<erikv at google.com> wrote:

> I believe some client implementations and registries will
> eventually support some symbols fully, no matter what this WG
> says.
> 
> Computing has changed quite a bit. The keyboard is just one
> way to enter a domain name. These days, many people use the
> mouse. They type queries into search engines, and click on
> search results that look interesting. They write blogs and
> make them available at domain names of their own choice,
> allowing readers to navigate to them with their mouse or
> handheld device. Or they redirect the user's typed URL to some
> other URL with their preferred characters. The domain name
> system does not have to be restricted to the keyboard any more.
> 
> Users must be taught to be careful with their private info,
> passwords, money, etc. UIs must display domain names carefully
> when secure transactions occur. For example, HTTPS connections
> require extra care. Users should be encouraged to type their
> bank's domain name, instead of clicking on an untrusted URL.
> 
> When a person approaches a Wells Fargo bank, to use the ATM,
> they look at the name of the bank, the coloring, etc, to make
> sure it's familiar. They make sure nobody is looking while
> they type in their PIN.
> 
> On the other hand, when a person approaches a toy store, the
> name and coloring of the store do not matter. The person is
> drawn to the store by the window displays, etc.
> 
> So my conclusion is that although a lot of work remains to be
> done to make UIs secure when they are required to be secure,
> ultimately this WG will not be able to keep symbols out of the
> lower-level domain names, and perhaps even the high-level
> ones, considering ICANN's recent behavior.
> 
> I realize that this is likely to be controversial, but I am
> just offering my opinion with a long-term view.
> 
> Erik
> 
> PS This works in Firefox: http://近親相姦☆.sblo.jp/