looking up unassigned characters
vint at google.com
Sun Mar 22 19:35:13 CET 2009
thanks, I made a mistake in the formulation of that question and I
have updated it to remove mention of DISALLOWED per both your and also
Paul Hoffman's recommendation.
1818 Library Street, Suite 400
Reston, VA 20190
vint at google.com
On Mar 22, 2009, at 2:31 PM, Erik van der Poel wrote:
> Final question from the summary:
>> B. There are few if any restrictions on the lookup phase of IDNAv2
>> (and IDNA2003). The consequences are that lookup will match
>> domain names injected into DNS by registries that are non-conformant
>> with registration restrictions intended by the protocol
>> This condition arises from permitting the looking up of DISALLOWED
>> or UNASSIGNED characters. How serious a problem is this in the
>> view of the WG?
> Actually, it's not true that IDNA2003 allows the lookup of
> "prohibited" (i.e. DISALLOWED) characters. There is a flag that allows
> the lookup of characters that were unassigned in Unicode 3.2.
> From my point of view, it would be nice if clients were allowed to
> lookup unassigned characters. Put yourself in the shoes of a Web
> search engine developer. You can update your crawler to support the
> latest version of IDNA. However, you cannot get all of your users to
> update their browsers very quickly. This is why Google emits URIs with
> Punycode (because MSIE6 does not perform IDNA).
> Now, a document that matches the user's search query very well is
> typically pushed up to the top of the first page of search results. If
> that document's URI happens to use newer Unicode characters, the
> user's browser may not be able to convert such Unicode labels to ASCII
> labels, and so it would be great if the search engine would perform
> the ASCII conversion.
> Of course, the old browser may not be able to display new Unicode
> characters either. So it would be prudent for the search engine to
> refrain from displaying the Unicode characters directly. Instead, it
> might present a small link to a warning page that explains why the URL
> hasn't been displayed. Likewise, the browser might refrain from
> displaying new Unicode characters too.
> Also, this could be abused by phishers who try to collect passwords
> and the like from unsuspecting users. This is part of the broader
> phishing problem, which can be attacked in a number of different ways,
> including careful display, user education and services that warn users
> that particular URIs have been discovered to be at phishing sites.
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update