Eszett and IDNAv2 vs IDNA2008

John C Klensin klensin at jck.com
Sun Mar 15 19:00:43 CET 2009



--On Sunday, March 15, 2009 9:19 AM -0700 Erik van der Poel 
<erikv at google.com> wrote:

>> The percentage of IE6 users who downloaded the VeriSign
>> plug-in in order to access IDN-labeled resources was 100%.
>
> :-)
>
> Yes, but what percentage of the users that were redirected to
> the plug-in download page actually downloaded it?

Our colleagues in China might want to comment on that because I 
think they have some statistics on the use of the various 
plug-ins, but the answer is that, for those who care about IDNs 
and encountered them, it is a very large number.  There really 
is empirical experience with this... and, incidentally, a rather 
large number of people who still run IE6, either because they 
didn't like some other features/ characteristics of IE7 or 
because their environments, whether they support automatic 
updating or not, did not automatically and successfully convert 
them from one IE6 to IE7.  Indeed and FWIW, the machine I'm now 
sitting in front of does not have IE7 installed -- in 
considerable measure because I got tired of one security alert 
after another, I don't use IE at all when I don't have to and I 
have found far more IE-sensitive/ IE-requiring sites that work 
better with IE6 than with IE7 (that is, of course, purely 
anecdotal but leads to another observation, which is, however 
rude you might think it is to require one particular browser, it 
is done all the time (how long has it been since you've seen a 
"this site works best with foo" or "this site requires bar" icon 
or notice)?

>>> I'm not sure whether Eszett users are clamoring for IDNAbis.
>>> Perhaps the Greeks are, but not because of Final Sigma. They
>>> want to solve their tonos problem. And I have to agree that
>>> ZWJ/ZWNJ users seem to be clamoring for IDNAbis.
>>
>> How on earth did IDNA2008 become a matter of providing
>> support for this handful of code points?
>
> I confess that I am overly focussed on this handful, primarily
> because of compatibility issues.

See above.  That is not an argument that you should not be 
focused on the characters that require some transition measures 
(or accepting the disruption and moving on), but a suggestion 
that it is worth keep goals in mind (or perhaps we also disagree 
about the goals).

>...
> If IDNA2008 plug-ins start performing multiple lookups, we will
> probably see HTML files start to take advantage of that. Then,
> when the browser developers get around to implementing
> IDNA2008, they may feel compelled to perform multiple lookups
> too, just to make those new HTML files "work".

This is, of course, a risk whether there are work-around 
plug-ins or not.. and is the reason I tend to favor letting the 
registries sort things out rather than encouraging 
implementations to do so.  And that comment applies as strongly 
to trying to make DNAME or CNAME queries for alternate labels as 
it does to some multiple lookup technique.

> Of course, this WG's intention is to use multiple lookup as a
> transition strategy, but if HTML files and Web sites that rely
> on the "old" lookup don't disappear completely, the browser
> developers may feel compelled to continue the multiple
> lookups, perhaps forever. Programmers often add features, but
> are scared to remove features, for fear of "breaking"
> something.

But, here, the issue is whether the new characters are fully 
available and, at the risk of resurrecting the wildcard debate 
and its side-effects and alternatives, getting us back into the 
problem of whether "no node found at that label" really means 
"stop and report failure to user" or "try to 'help' the user by 
diverting her into some alternate method or search for alternate 
names".  If it means the former, then a failure for find a label 
should be the end of it, with no second lookup.   If it means 
the latter, then it is possible that a failure to find a label 
ought to send us off into the user's preferred (or more likely 
the implementer's default) search machinery that knows about far 
more language and locale-specific orthographic variations than 
those that are specific to IDNA2003-> IDNA2008 transitions.

All I'm sure of is that we are going into dangerous territory by 
trying to specify those actions because folks who believe that 
they have either better judgment about their users (or different 
motives) than we do will simply ignore us.  And, IMO, that is 
actually a corollary to your comments about the interaction 
among too-clever page designers, browser leniency, and long-term 
support.  FWIW, we've got 30+ years of experience with this in 
the email community, where it inspires bad jokes about the 
robustness principle gone bad.

> This is pretty common on the Web. The browsers are too
> lenient, and so they end up having to support the Web sites
> and HTML files that take advantage of that leniency forever.

Well, up to the point that some particular flexibility/leniency 
starts being seen as an invitation to attacks and security 
problems.  Then one either needs to give the flexibility up or 
to adopt ever-more-complex heuristics to distinguish between 
"possible attack" and "some clever idea that someone used once 
and that we still need to support".  Judging again from the 
email experience, browser vendors will sooner or later get to 
the point where the workarounds for the heuristics for the 
previous workarounds get complicated enough that it becomes 
clear that the right answer is to be respond to some 
construction by saying "this is bogus enough, and risky enough, 
that it is better to refuse to access, render, or receive it 
than to try to assume that the sender/ page creator was 
well-intentioned and to guess what was meant.

Things obviously haven't gotten bad enough in the web community 
yet, but the email one has a decade or two "head start".

> Maybe I'm too pessimistic here, but I am quite concerned about
> multiple lookup.

I am too, but, possibly because I don't understand them well 
enough, your assumptions and workarounds seem even worse.  That 
is at least partially because I can see ways to "sunset" 
multiple lookups, at least for ordinary users who are mostly 
looking at materials and sites that are more or less 
contemporary.  I think the things you are suggesting support 
deliberately broken behavior and require retaining it forever, 
_even_ after the inevitable security problems and abuses 
--problems that almost always show up when we design systems to 
deduce what the user really intended-- become obvious.

    john



More information about the Idna-update mailing list