Browser IDN display policy: opinions sought
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Sat Dec 10 13:51:43 CET 2011
Hello Gervase, others,
I'm replying to the original post because I didn't find the right
message later in the thread to reply to.
About "payp-cyrillic-a-l.com": Exactly 0 people got robbed there, and
way too many got scared. That very clearly includes the browser makers,
who heavily overreacted.
I personally am very much with John in that I want to see as much of the
IDNs as what they are: Unicode characters, not punycode. I tell my
browser all the languages that I can understand to some extent, but I
don't want to tell it that I understand Korean (because I don't), even
though I know Hangul,.... On top of that, I don't see any problems at
all with scripts for which I would have to consult a code chart,
definitely not if they are e.g. as far apart from what I use daily as
e.g. Devanagari. But even leaving personal preferences aside, I don't
now why we can't try to display as much Unicode as we reasonably can in
the address/location bar, the same way we display Unicode in the page
I never really liked the type B approach for the reasons mentioned by
others, but I also think that the A approach is way too restrictive.
I think none of the browsers have made more than a very quick stab at
the issue. The A type browsers could easily extend their stuff to
include all the scripts that the user won't confuse (such as Indic and
East-Asian scripts for typical European users). On the other hand, they
might want to be more careful for whole-script confusables for users who
declare to read both (one of the many languages associated with) Latin
The B type browsers could easily ADD A-type stuff (including the above
improvements). They could also add some script-mixing detection to be
able to be more generous with their TLD screening process.
I can't really judge type C, it very much depends on how big the
whitelist is. If it's rather big, then C looks very good to me,
Also, now that we have non-ASCII TLDs, that gives us some new ideas. We
should be able to assume that ICANN wouldn't be open to visual spoofing
at the TLD level, such as e.g. not allowing whole-script confusables in
Cyrillic or Greek. That should mean that cyrillic.cyrillic and
equivalents are safe to display. And these are incidentally the domains
where IDNs are really at their best, and where the growth should go.
So the question for Mozilla (and other browser vendors) isn't "should we
switch from type X to type Y", but "how can we increase (potentially
drastically) the number of IDNs we can display without creating visual
spoofing traps". If Mozilla is able to show all IDNs that IE shows, and
some more on top of that (without including something like
payp-cyrillic-a-l.com), then that can only be an argument for using
Mozilla, not against it. And that's not the slippery slope of
bugwards-compatibility that continues to haunt HTML, but simply
displaying the data that's there correctly.
On 2011/12/09 20:12, Gervase Markham wrote:
> Recently, Mozilla community member Jothan Frakes was kind enough to do
> some research about how different popular web browsers implement IDN,
> and when they display the real characters and when they display
> Punycode. This is in the context of a Mozilla review of our policy. I am
> interested in the opinions of people on this list (see below).
> As it turns out, the behaviour of all popular browsers is summarised at
> the bottom a Chromium project document here:
> The policies fall into 3 approximate buckets:
> A (IE, Chrome): Unicode if the (single) 'language' of the string is
> configured in the options, Punycode otherwise.
> B (Firefox, Opera): Unicode if the TLD is in a whitelist, Punycode
> otherwise. Arbitrary script mixing permitted (registry policy used to
> prevent abuse).
> C (Safari): Unicode if the script is in a whitelist (which by default
> does not include Cyrillic or Greek), Punycode otherwise. Not sure about
> script mixing.
> Firefox has historically resisted adopting a Type A policy because we
> consider it seriously detrimental to IDN adoption and use. It seems to
> me that IDN can never be reliable for site owners, and therefore will
> not succceed, if a significant proportion of the world's browsers adopt
> Type A or Type C policies. This is because site owners can never know
> what proportion of their visitors will see gobbledegook in the URL bar
> rather than their nice domain name. Perhaps for sites whose visitors are
> all guaranteed to be from a particular country or language group, with
> properly-configured browsers and OSes which know that they speak a
> certain language or use a certain script, it might work - but I suggest
> that's a small subset of all sites. Many people in non-English-speaking
> countries still use English OSes and English browsers, with default
> Type C is particularly bad - Russian and Greek IDNs are broken by
> default, but even if you persuade your users to turn it on, they can
> then be mixed-script spoofed. You get to choose between functionality
> and security.
> By contrast, with a Type B policy, if your IDN domain works in one copy
> of Firefox, it works in them all. If everyone had Type B policies, there
> would be no risk of a properly-registered domain coming up as gibberish.
> It has been suggested that Firefox switch to a Type A policy. As it is,
> the mix of policies means that the goal of universal acceptability is
> not being met anyway. Firefox switching to Type A would also not meet
> that goal by itself, but one could argue that there's a bit more
> consistency to browser behaviour.
> I would be interested in the opinion of people on this list as to:
> - whether my analysis seems reasonable;
> - whether they prefer type A, B or C; and
> - whether they see any particular policy as more damaging to IDN
> adoption than another.
> Has anyone lobbied one browser manufacturer or another to change their
> policy? Is there another option that is not currently in use which would
> be better?
> (Note that "no restrictions" is not an option, given what happened in
> 2005 with payp-cyrillic-a-l.com, and I would rather not derail this
> debate by rehearsing those arguments again.)
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update