What rules have been used for the current list of codepoints?

Gervase Markham gerv at mozilla.org
Sat Dec 16 11:51:49 CET 2006


Erik van der Poel wrote:
 > On 12/15/06, Gervase Markham <gerv at mozilla.org> wrote:
 >> In many applications of IDNA, the display technology will not know the
 >> language(s) the user understands. The side of a bus is one obvious
 >> example
 >
 > The side of a bus is not a security issue. It isn't a serious
 > interoperability issue either.

It is. If the side of the bus says one thing, but I type another thing 
that looks the same, then I end up in the wrong place. This is a 
security risk, depending on the ownership of the "wrong place". 
Therefore, registries should not allow this situation.

 >> but even various computer applications may not know. Today's
 >> email clients don't generally know.
 >
 > Most email clients have a user interface in a single language. That
 > would be the default language to use for IDN display. Some clients may
 > even allow the user to specify multiple languages, as browsers do.

So if I speak four languages, I will get security warnings for the three 
I speak perfectly that aren't the user interface language? Or 
alternatively, every bit of software which deals with IDNs needs to have 
a language-selection UI?  And we have to educate users that the first 
thing they do when they sit down at a piece of Internet software is to 
tell it what languages they speak (rather than the task they actually 
wanted to achieve)?

Does the impracticality of this not just hit you between the eyes?

 >> Therefore, an IDNA system which relies for its safety on the client
 >> alerting the user to "unfamiliar characters", and the user noticing the
 >> alert and taking action, is dangerous.
 >
 > True. That's why we have blacklists of dangerous URLs like:
 >
 > http://www.paypal-com.secure-login.mx23.cc/
 >
 > And this is not even an IDN.

A blacklist system is reactive, high maintenance and always behind. 
Policies which prevent two similar-looking domains being registered to 
two different people are proactive.

Given the choice of noticing someone had registered a confusable domain 
and adding it to a blacklist (probably after people have lost money), or 
having a policy which prevents the registration of it and all other 
domains like it in the first place, I know which is less effort and more 
reliable.

 >> (Such a system also discourages the uptake of IDN, because the owner of
 >> an IDN domain name cannot know what proportion of his customers will see
 >> a scary warning message when they try and visit his site. But that's a
 >> different point to the security one.)
 >
 > Domain owners are free to register multiple domain names, one for each
 > language that his customers read. So the owner inserts the English
 > domain name in the English ad, the Japanese domain name in the
 > Japanese ad, and so on.

You seem to have this idea that language communities exist in their own 
isolated islands, all reading their own separate advertising and 
content. If things worked this way, why do we not have a Japanese 
Internet, a separate English Internet, and a separate Chinese Internet?

You also assume the being able to read a language is a binary thing. I 
can read enough French to work out how to drive a French e-commerce 
site, but as I don't normally see E-ACUTE in my normal English writing 
and reading, I may well miss the fact that I am at a fake site 
registered at a confusable which uses it in place of an e in the original.

 >> This is the policy that Firefox adopts; we currently display IDN domain
 >> names for around 30 TLDs, the registries for all of which have
 >> anti-spoofing policies. Any registry with such a policy is welcome to
 >> ask to be included in the list, and we will ship the list change in our
 >> next security update.
 >
 > I noticed that Firefox does not display Japanese .com names for me
 > even though I have Japanese in my list of languages (since I can read
 > it).

Indeed not. Verisign have not yet come forward with an anti-spoofing 
policy. I look forward to them doing so. Until they do, we have no 
guarantee that they aren't allowing confusables of important .com 
domains to be registered. You, as a technically sophisticated person, 
may not be fooled. But we can't say that of everyone.

 > However, Microsoft IE7 displays that name for me. Firefox is
 > still a relatively minor player. It will be interesting to see whether
 > the Firefox tail can wag the Verisign/Microsoft dog.

I suspect that having them not work in between 15 and 40% of users 
browsers (depending on which country you are in) would be a significant 
disincentive to deploying a registered IDN domain name.

 >> Are there no Japanese
 >> Americans? Are there no Japanese visiting America and using Internet
 >> cafes? Are there no Americans learning Japanese? Must all of these
 >> people reconfigure every browser and other IDNA-aware client they use to
 >> tell it all the languages they can read? And, in the case of a client
 >> they are using temporarily, configure it back afterwards to avoid
 >> putting others at risk?
 >
 > We are still at the very beginning of the adoption of IDNA, and it may
 > not ever truly catch on, but I suspect that Internet cafes will
 > eventually make it easier for users to change the language of the user
 > interface, including the browser's.

So for want of a little policy-making at this point in the IDN cycle, we 
instead need to revise the user interface on all the world's browsers, 
and educate all of the multi-lingual population?

 > True. It will be interesting to see what happens with Cyrillic domain
 > names in the long run. There is a company called uralweb where the
 > "ural" is in Cyrillic and the "web" is in Latin. Maybe the .ru and
 > .com registries will allow certain script mixtures, as long as it is
 > quite clear which part is Cyrillic and which part is Latin (unlike the
 > paypal spoof).

This is another argument for a registry policy-based solution. Cyrillic 
uralweb.ru would be fine as long as the .ru registry either blocked or 
bundled the ASCII version of uralweb.ru. Under other schemes, this isn't 
possible.

One further question: if I register www.café.org, what language or 
languages should I need to have configured such that my browser displays 
it without warning? French, clearly. But would English do? How about 
German? How does my browser tell whether a combination of letters is a 
valid word in a particular language?

Gerv


More information about the Idna-update mailing list