Q3: What characters should be allowed in a revised IDNA2008 specification?

Sat Apr 4 18:18:51 CEST 2009

On Sat, Apr 4, 2009 at 1:44 AM, John C Klensin <klensin at jck.com> wrote:
> When I read this, my reaction is something close to "sure, but
> so what?".  Let me explain:
>
> We do Internet standards in order to set out a set of rules and
> procedures that, if followed, will maximize interoperability,
> minimize operational problems, etc.   If someone doesn't
> conform, we don't try to call the protocol police... and that is
> only partially because they won't answer.  Very few of our
> standards go beyond "does not conform" to try, e.g., to specify
> what should happen if non-conforming behavior is encountered.
> We've even got a design and implementation principle that
> encourages implementations that send data to interpret the text
> of the standards as narrowly as possible and those who receive
> data to be more permissive than the standards require.
>
> But the standards are violated all the time at all levels of the
> stack, sometimes by implementations that are just sloppy,
> sometimes by ones that think they can gain some advantage by
> doing things in a different way, and sometimes by folks who are
> just a little too impressed by their own cleverness.   And,
> because good implementations that conform to the standards are
> typically permissive until that causes them problems, those
> non-conforming things often work, at least for a while and at
> least with some other implementations.  That doesn't make the
> standards bad, nor does it justify changing them to include
> anything that someone might want to try to get away with.
>
> There is an interesting example in the history of IDNs with a
> few domains who decided that they weren't interested in waiting
> for what became IDNA or much of anything else and just
> registered UTF-8 strings.  The DNS cares only about octets, so
> they were still conforming with the base DNS protocols.   That
> didn't work very well, first because "just register UTF-8
> strings", without any supporting structure, puts one in the
> ultimately "no mapping" situation (not even NFC was required)
> and second because many applications implementations wouldn't
> have anything to do with those domain names.  But sometimes they
> worked.  They even worked with a few pre-IDNA browsers that, in
> the interest of either robustness or sloppy implementations,
> simply accepted the UTF-8 strings and passed them off to the DNS.
>
> Then we produced IDNA and browsers started being switched over
> to notice non-ASCII domain names and convert them appropriately.
> At least for web applications involving those browsers, those
> UTF-8 domain names effectively stopped working... and I haven't
> noticed us worrying about transitions, either now or in
> IDNA2003.

Some of us have had to deal with pre-IDNA behavior on both the client
and server sides. For example, MSIE6 accepts both %-escaped and
not-%-escaped non-UTF-8 encodings in the host name and does different
things for DNS and HTTP. For DNS, it runs the text through the
localized version of Windows' "ANSI" converter to UTF-8, and for the
HTTP Host: header, it puts the not-%-escaped bytes in there, directly.
Some server-side implementations take advantage of this behavior by
setting up wildcard domains and then using the Host: header to perform
a lookup in their database (for a product, search, topic, etc).

> But, even now, I suspect that there are some
> applications implementations, and especially an email MTA or
> two, that will accept UTF-8 domains and pass them directly into
> the DNS for resolution.
>
> In that context, let's look at your example in the light of a
> comment Gerv made a few days ago.
>
>> PS This works in Firefox: http://近親相姦☆.sblo.jp/
>
> No surprise.  The characters are valid in IDNA2003, the guidance
> against registering a label containing "☆" (that star
> character -- I get a box when I try to copy and paste it)
> applies to registries and not what we now call lookup
> applications.  On the other hand, if and when Gerv and his
> colleagues conclude that the character poses a significant
> danger to users, they will stop looking it up.  That implies a
> couple of things:
>
>        * different behavior in different browsers about that
>        type of character.
>
>        * labels that can be successfully looked up one day, but
>        not the next, independent of whatever IDNA says.
>
> If I recall, Gerv's description of that situation was "Ick".  I
> agree with him.  I also note that registrants who understand
> that situation and want to avoid it will avoid such labels, with
> the
> "no symbols" requirement of IDNA2008 acting as a significant
> warning, and that banning all symbols in the standard reduces
> the long-term opportunities for domain names appearing to go bad.
>
> It seems to me those are significant advantages, even if one
> understands that there is no hope of completely banning
> undesired behavior.

Right. This WG is lucky to have implementers participating in the
discussions, and trying to reach consensus. Things can go a lot worse,
with implementers leaving the WG and starting their own WGs, specs and
implementations. Witness HTML5.

Isn't that why we're all here? To try to reach consensus, on a spec
that doesn't suffer the fate of OSI, SGML, DSSSL and XHTML? I.e. not
very widespread acceptance?

I do agree with Gerv's "Ick". We need to get the implementations to
align more. Actually, Firefox is one of the problems here, with their
insistence on displaying Punycode for *.com.

Erik

> --On Wednesday, April 01, 2009 10:21 -0700 Erik van der Poel
> <erikv at google.com> wrote:
>
>> I believe some client implementations and registries will
>> eventually support some symbols fully, no matter what this WG
>> says.
>>
>> Computing has changed quite a bit. The keyboard is just one
>> way to enter a domain name. These days, many people use the
>> mouse. They type queries into search engines, and click on
>> search results that look interesting. They write blogs and
>> make them available at domain names of their own choice,
>> allowing readers to navigate to them with their mouse or
>> handheld device. Or they redirect the user's typed URL to some
>> other URL with their preferred characters. The domain name
>> system does not have to be restricted to the keyboard any more.
>>
>> Users must be taught to be careful with their private info,
>> passwords, money, etc. UIs must display domain names carefully
>> when secure transactions occur. For example, HTTPS connections
>> require extra care. Users should be encouraged to type their
>> bank's domain name, instead of clicking on an untrusted URL.
>>
>> When a person approaches a Wells Fargo bank, to use the ATM,
>> they look at the name of the bank, the coloring, etc, to make
>> sure it's familiar. They make sure nobody is looking while
>> they type in their PIN.
>>
>> On the other hand, when a person approaches a toy store, the
>> name and coloring of the store do not matter. The person is
>> drawn to the store by the window displays, etc.
>>
>> So my conclusion is that although a lot of work remains to be
>> done to make UIs secure when they are required to be secure,
>> ultimately this WG will not be able to keep symbols out of the
>> lower-level domain names, and perhaps even the high-level
>> ones, considering ICANN's recent behavior.
>>
>> I realize that this is likely to be controversial, but I am
>> just offering my opinion with a long-term view.
>>
>> Erik
>>
>> PS This works in Firefox: http://近親相姦☆.sblo.jp/
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>