Proposed new Firefox IDN display algorithm

Gervase Markham gerv at mozilla.org
Sat Feb 4 17:27:00 CET 2012


On 02/02/12 22:12, Andrew Sullivan wrote:
> Under "Other Browsers", you have this: "[…]this does not give site
> owners any confidence that their IDN domain name will be correctly
> displayed for all their visitors (and no way of telling if it's not)."
> I hope it's clear that, in fact, no matter what you do you have no
> hope of fixing this problem.

We will then settle for the lesser goal of making it so that the problem 
is not our fault. :-)

> protocols.  (Also, just a nit: "IDN" stands for Internationalized
> Domain Name, so "IDN domain name" is redundant.)

Fixed, thank you.

> Under "Proposal", you have this: "The hope is that any intra-script
> near-homographs will be recognisable to people who understand that
> script."  The problem with this is that with very few exceptions,
> _nobody_ understands a script.  English and French, both of which I
> speak, are nominally written in the same script, for instance, but
> they use different parts of it; and they're about as close to one
> another as you can get.  The character U+00D8, LATIN CAPITAL LETTER O
> WITH STROKE (Ø) is certainly part of Latin, but unless I see it in the
> right context, I'll read it as DIGIT ZERO.

<shrug> A fair point, which I am not surprised by. The proposal 
intentionally repositions the balance between when to display and when 
not to display; there are bound to be an increased number of potentially 
problematic cases. As the document says, some of the problem has to be 
solved by registries.

If you can prove this is a truck-sized loophole and have a suggestion 
for closing it, I'm all ears :-)

> Under "Algorithm", you have this: "If a TLD is in the whitelist, we
> will unconditionally display Unicode."  Why do you believe that the
> TLD policies help?

The position is more that "removing them might have unintended 
consequences, such that IDNs which used to work no longer do". It also 
gives us the flexibility to give more leeway to a registry with good rules.

> None of the gTLDs, as far as I am aware, has a
> policy that old-fashioned LDH names can't have U-labels beneath them.
> Might it be enough for an attacker to put
> arabic-label.arabic-label.arabic-label.badguy.com, and expect the
> ASCII to get ignored?  (Maybe this is supposed to be solved by the
> greying out of everything not near the top of the tree?)

As you say.

> Also in that section is discussion of using the data from Unicode
> 6.1.  While I think this could be a good idea and I think it's worth
> considering carefully, I'm slightly worried about two things.  First,
> this is a new feature of Unicode, and it's hard to predict how well it
> will work in practice.  Second, are you planning just to code this
> into the browser, or are you planning on using the local Unicode
> facilities on the machine?  The latter seems preferable to me, but it
> means that you don't get this facility until Unicode 6.1 is on the
> machine (and of course, you can never get it for IDNA2003, since
> that's pinned to a pre-6.1 version of Unicode).

My understanding is that Firefox already contains Unicode character 
class data. The plan would be to expand it to include this data also.

> Finally, under that section, you have a plan to "display Punycode" in
> some cases.  As others already suggested in this thread, that seems as
> bad as anything else: A-labels are confusing to _everybody_.  (You
> have this as an open question, sort of, but assume you're going to
> display A-labels no matter what.  I think that's a mistake.)

Displaying the A-label has the significant advantage of removing the 
potentially confusable string from the user's view and replacing it with 
something which has no chance of being confused with any other 
normally-used domain, while otherwise providing minimal disruption to 
their browsing experience. I'm not sure I could write an error message 
about this that my Grandma could understand, and I'm not sure what 
action I would recommend that she take when viewing it anyway.

> Thanks for posting this, and for the invitation to comment.  I hope
> these comments are useful.

Very - thank you.

Gerv



More information about the Idna-update mailing list