Browser IDN display policy: opinions sought

Wed Dec 21 18:24:43 CET 2011

Folks,

Gerv's rather focused question seems to have turned into a
rather long and far-ranging set of threads.  I've learned a lot
from it; perhaps others have too. 

With the understanding that I'm speaking only for myself, let me
urge that people take a little bit of a break in honor of the
end of the conventional solar calendar year (and whatever
holidays people are celebrating, if relevant) and let some of
this absorb.  My sense of the high points/ separate issues is:

(1) Most browser vendors feel a need to protect their uses
against known threats.  Whether the "confusion" or "phishing"
problems are appropriate threats to be considered in that
context is a separate issue (#2 below) as well is how that
protection should be offered (#3 below).  Convincing them that
they should not be worried about such threats is probably a lost
cause.

(2) Regardless of how we feel about it and why, it is fairly
clear that the potential for confusion with a 37 character
repertoire is far less than the potential with a repertoire of
circa 50K characters or more.    That statement is true whether
the 37 glyphs are selected from "basic Latin" or just about any
other script.  In addition, there are huge differences between
confusion because of some inherent properties of the characters
(this is, I believe, the topic that Section 4 of UTR 39  and the
confusables.txt file are intended to address), confusion because
of user perception issues (e.g., "seeing what one expects to
see"), and confusion due to deliberate attacks (of which
phishing is a key, but not the only, example).  Whether the same
(browser or otherwise) remedies for all three are the same is an
open question.  While I may be wrong, it continues to appear to
me that different browser strategies put different emphasis on
those three issues by, e.g., assuming that I'm more likely to be
confused by characters in an unfamiliar script than by a
familiar one.

(3) If protection is going to be offered via the User Interface
(browser or otherwise), the mechanism chosen (Punycode, refusal
to render at all (via question marks, funny boxes, etc.),
highlighting, popups, etc.) are ultimately going to be a matter
of taste.  Some styles may be more appropriate to some browsers
than others, some more to some customers than others, and so on.
Even when appropriate fonts are not available for displaying the
native-character string, I imagine we could have a long debate
about whether it would be better to display Punycode or the
"undisplayable character" symbols of choice.   There are good
reasons why the IETF rarely enters deeply into that area and we
may be illustrating at least some of them.

(4) Thare is actually an IDNA2008 requirement that registries
handling IDNs establish policies for the strings that they are
willing to accept for registration.  A check on whether such
policies exist (however that is accomplished) is ultimately just
a check on conformance to the Standard.   Evaluations of whether
the pollcies are reasonable and/or adequate and/or actually
followed and enforced is, of course, a much different and harder
matter.

(5) Independent of how it is accomplished --or whether, in
today's environment, it can be accomplished at all-- it is clear
that the problems that could be evaluated and protected against
at the client UI end of things would be much reduced if there
were effective push-back against, or prevention of, deliberately
problematic registrations and delegations of names.    Some of
the browser policies started out as ways to put pressure on
registries to adopt such restrictions.    Unfortunately, most
"shun the bad guys" models work much better when almost everyone
conforms to community norms and only a few exceptional cases
need special treatment.   When a large fraction of the cases
need the special "your policy model and its enforcement aren't
good enough" treatment, the whole approach becomes somewhat less
effective (whether it is enough less effective to be worth
dropping depends on judgment calls about tradeoffs between risks
and tradeoffs.

If people are going to consider the discussion and any
characterization like the above is helpful, might I suggest
separate threads?

  best,
    john