What rules have been used for the current list of codepoints?

Mark Davis mark.davis at icu-project.org
Sat Dec 16 21:38:32 CET 2006


I think some of the statements on this thread are based on misinformation.

There are a number of different useful techniques for reducing
spoofing. According
to Michel, what IE7 actually does is to look at the languages set in the
browser, and deduce from that a set of scripts. Any name that contains
characters outside of those scripts (or fails in certain other ways) is
shown in the raw punycode form. Here are some illustrative cases, that I
just gathered:

 User Entry Firefox IE7 Comments  www.þorn.is
http://www.þorn.is/<http://www.%C3%BEorn.is/> þ
is Latin  bäcker.com http://xn--bcker-gra.com/
http://bäcker.com/<http://xn--bcker-gra.com/>
   путин.museum
http://путин.museum/<http://%D0%BF%D1%83%D1%82%D0%B8%D0%BD.museum/>
http://xn--h1akeme.museum/ <http://%D0%BF%D1%83%D1%82%D0%B8%D0%BD.museum/>
I♥NY.museum http://i♥ny.museum/ <http://i%E2%99%A5ny.museum/>
http://xn--iny-zx5a.museum/ <http://i%E2%99%A5ny.museum/>    pаypal.museum
http://pаypal.museum/ <http://p%D0%B0ypal.museum/>
http://xn--pypal-4ve.museum/ <http://p%D0%B0ypal.museum/> 'a' is Cyrillic
ibm.com⁄foo.museum http://amazon.xn--comfoo-rq0c.museum/ fraction-slash
That's in my browsers, where I don't include Russian, but do include at
least one European language. If I add Russian to my browser's languages in
IE7, then путин.museum is allowed, but not pаypal.museum (mixed scripts). This
is just what the browser does with the URL, not whether the name is
permitted by the registry.

Thus it doesn't look at particular languages, nor does it have to see
whether "a combination of letters is a valid word in a given language". One
could have achieved the same end by having a check-off list of scripts,
although I think the choice of using the language list (which the browsers
already have) nicely avoids having an extra UI.

We'll see over time which type of interface users prefer. (As for myself,
although I use Firefox generally as my browser, I prefer the IE7 approach in
this case since regardless of the registry it catches bad URLs like
pаypal.museum, but lets me use reasonable URLs like bäcker.com.)

Mark

On 12/16/06, Erik van der Poel <erikv at google.com> wrote:
>
> On 12/16/06, Gervase Markham <gerv at mozilla.org> wrote:
> > If the side of the bus says one thing, but I type another thing
> > that looks the same, then I end up in the wrong place. This is a
> > security risk, depending on the ownership of the "wrong place".
> > Therefore, registries should not allow this situation.
>
> I agree that, ideally, registries would not allow this situation.
>
> > So if I speak four languages, I will get security warnings for the three
> > I speak perfectly that aren't the user interface language?
>
> I wasn't really talking about "security warnings". I mentioned one
> idea for displaying domain names to the user. If the name contains
> unfamiliar characters, it is not shown as is. Instead, say, a shaded
> rectangle, about the same size as the original name, is shown. This is
> just one example of a possible UI.
>
> If the four languages you speak are "close" to each other and use
> mostly the same characters, perhaps such groups of languages would be
> treated as, well, groups. English, French and German are sufficiently
> "close" that showing an e-acute to an English speaker is probably OK.
>
> > You seem to have this idea that language communities exist in their own
> > isolated islands, all reading their own separate advertising and
> > content. If things worked this way, why do we not have a Japanese
> > Internet, a separate English Internet, and a separate Chinese Internet?
>
> No, we have a single Internet, and a lot of people speak English as a
> 2nd language, so email that crosses international boundaries is often
> in English (like this one).
>
> This does not prevent companies from setting up Web sites in multiple
> languages.
>
> > I suspect that having them not work in between 15 and 40% of users
> > browsers (depending on which country you are in) would be a significant
> > disincentive to deploying a registered IDN domain name.
>
> Ideally, all registries would adopt and enforce sensible policies like
> the ones you mention. In the meantime, some software such as
> Microsoft's will probably continue to display some of the .com IDNs. I
> guess time will tell.
>
> Erik
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061216/cfce46a0/attachment-0001.html


More information about the Idna-update mailing list