Proposed new Firefox IDN display algorithm

Fri Feb 17 16:52:52 CET 2012

Hi John,

Thanks, as always, for your thoughtful input.

On 06/02/12 20:24, John C Klensin wrote:
> * I think your background/ problem statement is misleading.
> That may distort some of the rest of the document.  Your choice
> is not "to display or not to display".  A-labels are a display
> option, not non-display.  U-labels are another display form.  So
> are "????" and little boxes.  That would be just a pedantic
> distinction except for two things.  One is that you have a
> family of other options: display in lurid colors, pop-up
> warnings if someone tries to click on a link, or just outright
> refusal to use the URL... perhaps more.

This is true. Although I've outlined the advantages of A-label display 
in error conditions in other messages in this thread. However, if it 
turns out that this change allows us to display the vast majority of 
used IDNs, there may be a case for some more severe error condition when 
we hit one we think we shouldn't. Is that what you are suggesting?

> The other is that,
> while you can safely assume that A-labels can be displayed, you
> cannot guarantee correct display and rendering of U-labels
> unless Firefox starts carrying around its own reference fonts
> and rendering routines.  There are actually some things to
> recommend the latter but keeping the footprint small is not one
> of them.

We do actually carry around our own rendering routines:
http://en.wikipedia.org/wiki/HarfBuzz#HarfBuzz

We don't carry around our own fonts. But what point are you making here? 
If we find (somehow) that we are unable to display a U-label correctly, 
we should do something else other than what happens by default now, 
which I suspect is little boxes with numbers in? (Although I haven't 
checked - can you point me at a test domain name which uses highly 
obscure Unicode characters?)

> A-label display or something else.   In other words, at least in
> the proposal, I'm trying to get you to separate "identification
> of a label that deserves worrying about" (or "identification of
> a label which is safe" with all others defaulting into "worry
> about") from what you do about that label or the FQDN of which
> it is part.

OK. If currently we have:

                    Can Display          Can't Display

Worrisome Label    A-label              A-label

Good Label         U-label              Little boxes

How would you have us modify that matrix? (This assumes for the sake of 
argument that it's possible to distinguish between the bottom two boxes.)

> (1) The rule about ZWJ/ZWNJ (and maybe some other things) is
> that labels containing them should not be displayed without
> special treatment unless they are effectively visible on
> display.  For the purpose of that classification, the CONTEXTJ
> requirements of IDNA are a starting point: you should probably
> do at least that and might want to go further.

Yes; I don't think this document is meant to override any display rules 
in IDNA2003 or 2008 (whichever one we are trying to implement at the time).

> (2) It seems to me that one of the problems with any strategy
> that tries to decide between "safe" names and ones that need
> special treatment on any basis other than experience with, or
> the reputation of, the particular domain or site is doomed to
> mistreat some sites and registrants who are perfectly ok in
> order to protect against sloppy behavior or bad deeds of someone
> else.  That is true with your old policy, your proposed new
> policy, and the policies of the other browsers.  The differences
> are just about who gets hurt and why.    One of the things we
> now know that we didn't when you first deployed your policy is
> that the policy seems to have about zero effect on registrant
> choices about the TLDs in which to register.  If the policy
> caused a significant change in registrant decisions, we can
> assume that some obvious popular TLDs would be beating a path to
> your door to sign up.    While it appears initially to be a
> separate set of issues, I think the decision in recent versions
> of Firefox to hide protocol identifiers in the displayed form of
> the address bar may be a liability for the IDN case.  Suppose
> you could actually be careful about either the certificate
> authorities you recognized by default or, as you do with TLDs
> under these policies, you did some classification of "trusted"
> and "less trusted".    In that case, perhaps a site that was
> accessed via HTTPS and that presented a certificate from a
> trusted CA could be exempted from special treatment regardless
> of the TLD in which their domain appeared.   In other words,
> normal ("Unicode") display would be possible if the TLD appeared
> on the whitelist, or you got a cert that you actually trusted,
> or the string passed your heuristic.

To clarify: you are suggesting that sites with a cert from a trusted CA 
should get U-labels regardless?

CA certificates are about identity, not about "good-ness" or honesty. I 
have, coincidentally, just been arguing in another, CA-related, forum 
that they should by no means put restrictions on which domain names 
people can get certificates for, because deciding which domain names 
should and shouldn't exist is the job of registries, and once they've 
done that, a CA should go along with it. It should not be possible to 
register a domain and yet not be able to get a cert for it because the 
registry is fine with the name you pick but the CA isn't. I can imagine 
domain owners being quite put out if that could/did happen.

> (3) Personally, I'd add a user-specific FQDN (not TLD) whitelist
> to that list of hard exceptions.

Users can technically add TLDs to the list. (I don't think they can add 
FQDNs.)

> In order to avoid even more
> databases/ tables, suppose you checked a string to be displayed
> against the user's bookmark list _and_ created some extra
> warning and explanation if the user decided to bookmark a string
> that you considered suspicious.  If the user decided to bookmark
> the site despite those warnings/ explanations, maybe you should
> believe that he or she knows what they are doing and get out of
> the way.

I'm not sure enough users use bookmarks in the "traditional" way like 
that (or at all) for it to be a good idea to bake them into the security 
strategy.

> (4) While I think the new classification data for "Common" and
> "Inherited" scripts that Mark identifies will be useful,
> especially to registries who are trying to make intelligent
> decisions about what to accept, I want to caution against doing
> anything at lookup time that is dependent on an inferred
> language.

I tried to write the document in terms of "script", not "language" - I 
understand the perils of trying to infer the latter. Are you suggesting 
I haven't succeeded? If so, point me at the broken bits.

> (5) One of the disadvantages of going to A-label display is
> that, while A-labels provide a good clue that something unusual
> is going on, users may vary widely as to whether that is taken
> as a warning sign or just one of many incomprehensible things
> that happens on the Internet.  If the user, grandmother or not,
> is using the network by rote and with the assumption that a
> great deal of it is just magic, then it is possible that any
> A-label is confusable with any other A-label.  As long as
> A-label display is rare and the user never has the experience of
> having an intended and safe FQDN displayed in A-labels, that is
> probably ok.  But, A-label display becomes common and some of
> the labels thus displayed are actually associated with
> reasonable domains and safe sites, the warning value of that
> technique will deteriorate significantly unless users can
> remember which A-labels have been visited before and are ok.  I
> don't think we can count on that.

I agree that seeing A-labels should be a rare thing, and perhaps one of 
the downsides of the current implementation is that users might see them 
more often than one would like. I hope the new proposal will reduce the 
incidence of this. Given that, I can't quite see your point?

> (6) I think your heuristic itself is about as good as you are
> going to get.  I might be able to suggest small variations, but
> I think they would mostly just shift the false negatives around
> a bit rather than resulting in substantial improvement.   But I
> have to wonder about your threat model.  If the goal is to say
> "something needs to be done and we are doing as much as we
> reasonably can, even if it is not likely to be very effective",
> I have no problems with that.  But, if you want to move well
> beyond that, especially in a world in which I have to expect
> ICANN's methods to prevent confusing name pairs at the top level
> will fail in at least some significant cases (explanation on
> request),

By private mail, please :-)

> it.   I suggest, as noted above, that we good learn the real
> lesson from the "paypal" example that started most of us down
> this path, which is that the real problem maybe isn't the name
> but the ease by which a bogus organization with a fake identity
> can obtain a certificate simply by "owning" a domain name and
> mailbox for a short time.

There are certainly issues with the certificate system, but I think they 
are mostly orthogonal to issues with the display of IDNs.

Gerv