Proposed new Firefox IDN display algorithm
gerv at mozilla.org
Sat Feb 4 17:27:00 CET 2012
On 02/02/12 22:12, Andrew Sullivan wrote:
> Under "Other Browsers", you have this: "[…]this does not give site
> owners any confidence that their IDN domain name will be correctly
> displayed for all their visitors (and no way of telling if it's not)."
> I hope it's clear that, in fact, no matter what you do you have no
> hope of fixing this problem.
We will then settle for the lesser goal of making it so that the problem
is not our fault. :-)
> protocols. (Also, just a nit: "IDN" stands for Internationalized
> Domain Name, so "IDN domain name" is redundant.)
Fixed, thank you.
> Under "Proposal", you have this: "The hope is that any intra-script
> near-homographs will be recognisable to people who understand that
> script." The problem with this is that with very few exceptions,
> _nobody_ understands a script. English and French, both of which I
> speak, are nominally written in the same script, for instance, but
> they use different parts of it; and they're about as close to one
> another as you can get. The character U+00D8, LATIN CAPITAL LETTER O
> WITH STROKE (Ø) is certainly part of Latin, but unless I see it in the
> right context, I'll read it as DIGIT ZERO.
<shrug> A fair point, which I am not surprised by. The proposal
intentionally repositions the balance between when to display and when
not to display; there are bound to be an increased number of potentially
problematic cases. As the document says, some of the problem has to be
solved by registries.
If you can prove this is a truck-sized loophole and have a suggestion
for closing it, I'm all ears :-)
> Under "Algorithm", you have this: "If a TLD is in the whitelist, we
> will unconditionally display Unicode." Why do you believe that the
> TLD policies help?
The position is more that "removing them might have unintended
consequences, such that IDNs which used to work no longer do". It also
gives us the flexibility to give more leeway to a registry with good rules.
> None of the gTLDs, as far as I am aware, has a
> policy that old-fashioned LDH names can't have U-labels beneath them.
> Might it be enough for an attacker to put
> arabic-label.arabic-label.arabic-label.badguy.com, and expect the
> ASCII to get ignored? (Maybe this is supposed to be solved by the
> greying out of everything not near the top of the tree?)
As you say.
> Also in that section is discussion of using the data from Unicode
> 6.1. While I think this could be a good idea and I think it's worth
> considering carefully, I'm slightly worried about two things. First,
> this is a new feature of Unicode, and it's hard to predict how well it
> will work in practice. Second, are you planning just to code this
> into the browser, or are you planning on using the local Unicode
> facilities on the machine? The latter seems preferable to me, but it
> means that you don't get this facility until Unicode 6.1 is on the
> machine (and of course, you can never get it for IDNA2003, since
> that's pinned to a pre-6.1 version of Unicode).
My understanding is that Firefox already contains Unicode character
class data. The plan would be to expand it to include this data also.
> Finally, under that section, you have a plan to "display Punycode" in
> some cases. As others already suggested in this thread, that seems as
> bad as anything else: A-labels are confusing to _everybody_. (You
> have this as an open question, sort of, but assume you're going to
> display A-labels no matter what. I think that's a mistake.)
Displaying the A-label has the significant advantage of removing the
potentially confusable string from the user's view and replacing it with
something which has no chance of being confused with any other
normally-used domain, while otherwise providing minimal disruption to
their browsing experience. I'm not sure I could write an error message
about this that my Grandma could understand, and I'm not sure what
action I would recommend that she take when viewing it anyway.
> Thanks for posting this, and for the invitation to comment. I hope
> these comments are useful.
Very - thank you.
More information about the Idna-update