Stop me if I've misunderstood...

Mark Davis ⌛ mark at macchiato.com
Fri Jul 10 19:15:13 CEST 2009


I'd like to echo what Shawn says. Having an effectively indeterminate
mapping is nasty for browser vendors, and for search engines, and email
clients, and for others -- and ultimately that translates into being
problematic for end users. (By "effectively indeterminate", I mean that a
lookup of href='http://übergrößen.de <http://%C3%BCbergr%C3%B6ssen.de/>' may
go to different places, depending on the version of IDNA, and href='
http://Übergrößen.de <http://%C3%BCbergr%C3%B6ssen.de/>' may or may not work
at all, depending on the version of IDNA.) He's talked about the browser
side; look at some other cases.

At Google, we associate a URL with content. Internally, we depend heavily on
that association, and build up indices on that basis. When href='
http://Übergrößen.de <http://%C3%BCbergr%C3%B6ssen.de/>' goes to a different
place than href='http://übergrössen.de <http://%C3%BCbergr%C3%B6ssen.de/>'
(on different browsers or different combinations of browsers), that
complicates our internals considerably. It also makes presentation to users
more difficult; when URLs are displayed we don't know which form to provide;
doing that on the basis of the user's browser version is unpleasant, and if
the version is older, we'd have no choice but to present the punycode form
for the new mapping.

It is, of course, technically feasible to associate one URL with two
different pages, but the complications of doing this with indices that span
you-can't-even-believe-how-many pages are pretty daunting. And the security
implications are problematic; what if the URL leads to a spoof site on some
browsers, but to a legitimate site on others?

And take email clients. Most good ones nowadays parse content and turn URLs
into links. What should they do with
http://Übergrößen.de<http://%C3%BCbergr%C3%B6ssen.de/>?
Which link did the originator of the email mean? What browser were they
working on? That is impossible to know.

Moreover, I don't see this changing. All the vendors will need to support
IDNA2003 mappings, otherwise users will complain that (as Shawn says) that
the vendor's products are broken. (This WG is insulated from those user
complaints.) Because the vendors will all need to support both forms, people
can and will go on generating content that uses the IDNA2003 mappings.
Because people will continue to generate content that uses those mappings,
the vendors will need to keep supporting them.

The worst result would be that we come out with IDNA2008, and it gets
ignored by some major players because of these incompatibilities. Look at
XML 1.1; it had significant advantages of XML 1.0, and a few of minor
incompatibilities with XML 1.0 (minor at least in the eyes of its working
group), but those "minor" incompatibilities stopped it dead in the water.

===

What has become clear to me is that the problems with mapping that inspired
the original design of IDNA2008 were primarily problems on the *registration
* side. When I look back over the email, all of them seem to be associated
with complaints from registrars (it's a bit difficult to make out, since
there is no citation of data as to the magnitude of problems, just anecdotes
without specifics, but this seems to be the case). And I think have
developed a consensus that forbidding a mapping on the registration side is
an improvement, and that forbidding a mapping on the registration side is
feasible to put into place.

Unfortunately, however, that principle has carried over to the lookup side,
where changing the mapping from IDNA2003 represents a real problem for
vendors and their users. As I said, I think we can accomodate mapping
changes for extremely infrequent characters that appear vital for some
communities: the ZWJ/ZWNJ (with appropriate CONTEXT restrictions because of
the spoofing opportunities represented by these). But there is little reason
to change the other mappings from IDNA2003, and a lot of downside to not
having them be obligatory in the lookup transformation from text to
punycode.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090710/f36e0db6/attachment.htm 


More information about the Idna-update mailing list