What rules have been used for the current list of codepoints?

Gervase Markham gerv at mozilla.org
Fri Dec 15 11:06:51 CET 2006


Erik van der Poel wrote:
> The way I see it, the current discussion is about the set of
> characters to allow in IDNs, both post-IDNA, wrapped in Punycode in
> DNS packets, and pre-IDNA in various encodings in e.g. HTML. In order
> to accommodate various languages, this set looks like it will be quite
> large, and there will be characters that look like each other. The way
> to prevent spoofs is to avoid showing unfamiliar characters to users.

I have to disagree with this.

In many applications of IDNA, the display technology will not know the
language(s) the user understands. The side of a bus is one obvious
example, but even various computer applications may not know. Today's
email clients don't generally know.

It is also true that getting users to understand and take heed of even
the simplest security-related UI is a hard battle, and therefore what
they must pay attention to must be the absolute minimum necessary.

Therefore, an IDNA system which relies for its safety on the client
alerting the user to "unfamiliar characters", and the user noticing the
alert and taking action, is dangerous.

(Such a system also discourages the uptake of IDN, because the owner of
an IDN domain name cannot know what proportion of his customers will see
a scary warning message when they try and visit his site. But that's a
different point to the security one.)

The way to avoid spoofs is, instead, to cut down the set of characters
as far as is reasonably possible and then to require registries to have
policies which do not issue two confusable domain names to different
entities. There are several technical and logistical ways of achieving
this. I see no reason why a registry would object to having a policy
which prevents some of its customers defrauding other of its customers.

This is the policy that Firefox adopts; we currently display IDN domain
names for around 30 TLDs, the registries for all of which have
anti-spoofing policies. Any registry with such a policy is welcome to
ask to be included in the list, and we will ship the list change in our
next security update.

> Japanese domain names would not be shown to most American users, and
> that's OK because they can't read them anyway!

That's rather a naive and sweeping statement. Are there no Japanese
Americans? Are there no Japanese visiting America and using Internet
cafes? Are there no Americans learning Japanese? Must all of these
people reconfigure every browser and other IDNA-aware client they use to
tell it all the languages they can read? And, in the case of a client
they are using temporarily, configure it back afterwards to avoid
putting others at risk?

> And most Americans are not interested in doing business with a paypal
> that has one of the letters in Cyrillic, so you just don't show them
> the spoofed name. You show them a warning instead.

This is the mixed-script question, not the unfamiliar character
question. But, to use your example, if payp<cyrillic-a>l.com gets issued
to someone other than the owner of paypal.com, then that is squarely the
responsibility of the .com registrar, and they should be taken to task
for it. It should not need to be the user's responsibility to avoid
being taken in.

Gerv



More information about the Idna-update mailing list