looking up unassigned characters

Vint Cerf vint at google.com
Sun Mar 22 19:35:13 CET 2009


Erik,

thanks, I made a mistake in the formulation of that question and I  
have updated it to remove mention of DISALLOWED per both your and also  
Paul Hoffman's recommendation.

v


Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com




On Mar 22, 2009, at 2:31 PM, Erik van der Poel wrote:

> Final question from the summary:
>
>> B. There are few if any restrictions on the lookup phase of IDNAv2
>> (and IDNA2003). The consequences are that lookup will match
>> domain names injected into DNS by registries that are non-conformant
>> with registration restrictions intended by the protocol  
>> specification.
>> This condition arises from permitting the looking up of DISALLOWED
>> or UNASSIGNED characters. How serious a problem is this in the
>> view of the WG?
>
> Actually, it's not true that IDNA2003 allows the lookup of
> "prohibited" (i.e. DISALLOWED) characters. There is a flag that allows
> the lookup of characters that were unassigned in Unicode 3.2.
>
> From my point of view, it would be nice if clients were allowed to
> lookup unassigned characters. Put yourself in the shoes of a Web
> search engine developer. You can update your crawler to support the
> latest version of IDNA. However, you cannot get all of your users to
> update their browsers very quickly. This is why Google emits URIs with
> Punycode (because MSIE6 does not perform IDNA).
>
> Now, a document that matches the user's search query very well is
> typically pushed up to the top of the first page of search results. If
> that document's URI happens to use newer Unicode characters, the
> user's browser may not be able to convert such Unicode labels to ASCII
> labels, and so it would be great if the search engine would perform
> the ASCII conversion.
>
> Of course, the old browser may not be able to display new Unicode
> characters either. So it would be prudent for the search engine to
> refrain from displaying the Unicode characters directly. Instead, it
> might present a small link to a warning page that explains why the URL
> hasn't been displayed. Likewise, the browser might refrain from
> displaying new Unicode characters too.
>
> Also, this could be abused by phishers who try to collect passwords
> and the like from unsuspecting users. This is part of the broader
> phishing problem, which can be attacked in a number of different ways,
> including careful display, user education and services that warn users
> that particular URIs have been discovered to be at phishing sites.
>
> Erik
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list