Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Shawn Steele Shawn.Steele at microsoft.com
Sat Jul 25 09:56:57 CEST 2009


I'm quite concerned about the effort put into a few code points to "solve" homograph issues when it does pretty much nothing to solve phishing.  It's like locking the windows and keeping the front door wide open.

Phishers aren't going to let a little thing like a middle dot stop them, it's just as easy, or easier, to just register "paypal.safest.com" and go from there.  I'm redirected to meaningless URLs all the time for legitimate order completion service providers.  I'd probably be more suspicious about a "floating" period than some of the URLs I have seen.

What this does accomplish is making the spec harder to figure out and conform to, while only partially solving corner cases of the security problem.

-Shawn

________________________________________
From: idna-update-bounces at alvestrand.no [idna-update-bounces at alvestrand.no] on behalf of John C Klensin [klensin at jck.com]
Sent: Saturday, July 25, 2009 12:44 AM
To: Wil Tan; Mark Davis ⌛
Cc: Patrik Fältström; IDNA update work; Kenneth Whistler
Subject: Re: Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

--On Saturday, July 25, 2009 12:54 +1000 Wil Tan
<wil at cloudregistry.net> wrote:

> Though I had concerns before, I agree with you that we should
> just make it PVALID.
>
> I still think that the character should be constraint to be
> used within Japanese context only (which means
> Han|Hiragana|Katakana|LDH) and not with any script, but it
> hardly seems worthwhile for us to define a contextual rule for
> it.

While it can clearly be changed, let's not do this lightly.
Precisely because "Japanese context" includes Romanji, a
registry-applied "no mixed script" rule doesn't prevent
embedding Katakana Middle Dot in an otherwise-all-Latin string.
And, as I (and, if I recall, Ken and others) have said several
times, one cannot rely on the appearance of the "normal" Unicode
glyphs to differentiate characters in cases like this, but most
accept both wide differences in font design and "you see what
you expect to see" user expectations.   IMO, we should, in that
regard, pay special attention to the expectations of
undecorated, or mostly-undecorated, Latin-character users
because they have many years of history leading them to expect
that Internet text is, by default, ASCII.

In addition, things that could be confused with dots should be
treated with special care because we already know that one of
the more popular phishing tricks is to create a URL in which one
popular and likely domain, such as a banking site or popular
storefront, is confused by the user with a more complicated URL
pointing to a malware or identity theft site.

In summary, I think it is clear that the particular contextual
rule associated with this character should be correct and should
optimally balance having a narrow and precise rule with
complexity.  I'm not competent to get that right and am very
pleased that those of you who are competent have taken that task
over.  But it seems to me that just making it PVALID is
problematic enough to require a much stronger argument than
we've seen... especially at this late date.

    john

_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list