What rules have been used for the current list of codepoints?

Fri Dec 15 01:45:12 CET 2006

Vint,

The way I see it, the current discussion is about the set of
characters to allow in IDNs, both post-IDNA, wrapped in Punycode in
DNS packets, and pre-IDNA in various encodings in e.g. HTML. In order
to accommodate various languages, this set looks like it will be quite
large, and there will be characters that look like each other. The way
to prevent spoofs is to avoid showing unfamiliar characters to users.
Microsoft does this by looking at the user's chosen language(s) in the
Accept-Language header. When an IDN is found to contain characters
outside the user's language(s), the IDN should not be shown as
Unicode, since it may be a spoof. Nor should it be shown as Punycode,
since some of those are tricky too e.g. xn--intel. Various user
interfaces are possible, but one possibility is to show a red or
dotted rectangle.

This means that 2 things will be needed to assure end-to-end
interoperability: One is the current exercise to prevent certain
characters at the protocol level(s), and the other is at the 2 ends,
namely to get the registries and implementors to agree on the set of
characters to use for each language. For example, if Verisign and
Microsoft could agree on the set of characters to use for Japanese,
then Verisign could offer those characters to their Japanese
registrants, who would then be assured that Microsoft would display
those characters to Japanese users.

Japanese domain names would not be shown to most American users, and
that's OK because they can't read them anyway!

And most Americans are not interested in doing business with a paypal
that has one of the letters in Cyrillic, so you just don't show them
the spoofed name. You show them a warning instead.

Erik

On 12/14/06, Vint Cerf <vint at google.com> wrote:
> I have been following along, perhaps with less understanding than many, but
> I continue to have concerns that we are not always distinguishing that which
> is needed for expressive natural language, and that which is safe, stable,
> and secure for Internet domain names.
>
> Vint
>
>
>
> Vinton G Cerf
> Chief Internet Evangelist
> Google
> Regus Suite 384
> 13800 Coppermine Road
> Herndon, VA 20171
>
> +1 703 234-1823
> +1 703-234-5822 (f)
>
> vint at google.com
> www.google.com
>
>
> -----Original Message-----
> From: idna-update-bounces at alvestrand.no
> [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Michael Everson
> Sent: Thursday, December 14, 2006 5:10 AM
> To: idna-update at alvestrand.no
> Subject: Re: What rules have been used for the current list of codepoints?
>
> At 09:51 +0100 2006-12-14, Patrik Fältström wrote:
> >On 13 dec 2006, at 17.45, Kenneth Whistler wrote:
> >
> >>, but *I* am absolutely sure that Lm and Nd need to be added.
> >
> >Exactly. A good start of finding a consensus.
> >Crisp statement, easy to say "yes" or "no" to.
> >
> >Hmmm...I have got some requests to not include IPA Extensions, and here
> >might be an issue where I do not see consensus.
>
> , but *I* am absolutely sure that you cannot exclude characters from this
> block by excluding the block. This will deny IDN to millions of people.
>
> Is that clear enough?
> --
> Michael Everson * http://www.evertype.com
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>