I agree with Shawn. There are some perfectly reasonable options, but #3 is not one of them.<br><ol><li>Make it PVALID</li><li>Make it CONTEXTO but dirt simple.</li><li>Wrangle about complicated conditions pointlessly, when there are so, so many more characters that cause far worse problems.</li>
</ol>A dirt simple approach is to just make sure that there is at least one CJK character someplace in the label.<br><br>KATAKANA MIDDLE DOT:<br><br>Rule Set:<br> False;<br> For All Characters C:<br>
If Script(C) <span>.in. {Hiragana, Katakana, Han}</span><br>
.or. Block(C) <span>.eq. {CJK_Symbols_And_Punctuation}</span> Then True;<br>
End For;<br><br>or in regex:<br><br>if (label.find("\u30FB") <br> && !label.find("[\p{script=<span>Hiragana|Katakana|Han</span>}\p{block=<span>CJK_Symbols_And_Punctuation</span>}]") {<br>
return false;<br>}<br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Sat, Jul 25, 2009 at 00:56, Shawn Steele <span dir="ltr"><<a href="mailto:Shawn.Steele@microsoft.com">Shawn.Steele@microsoft.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I'm quite concerned about the effort put into a few code points to "solve" homograph issues when it does pretty much nothing to solve phishing. It's like locking the windows and keeping the front door wide open.<br>
<br>
Phishers aren't going to let a little thing like a middle dot stop them, it's just as easy, or easier, to just register "<a href="http://paypal.safest.com" target="_blank">paypal.safest.com</a>" and go from there. I'm redirected to meaningless URLs all the time for legitimate order completion service providers. I'd probably be more suspicious about a "floating" period than some of the URLs I have seen.<br>
<br>
What this does accomplish is making the spec harder to figure out and conform to, while only partially solving corner cases of the security problem.<br>
<br>
-Shawn<br>
<br>
________________________________________<br>
From: <a href="mailto:idna-update-bounces@alvestrand.no">idna-update-bounces@alvestrand.no</a> [<a href="mailto:idna-update-bounces@alvestrand.no">idna-update-bounces@alvestrand.no</a>] on behalf of John C Klensin [<a href="mailto:klensin@jck.com">klensin@jck.com</a>]<br>
Sent: Saturday, July 25, 2009 12:44 AM<br>
To: Wil Tan; Mark Davis ⌛<br>
Cc: Patrik Fältström; IDNA update work; Kenneth Whistler<br>
Subject: Re: Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)<br>
<div><div></div><div class="h5"><br>
--On Saturday, July 25, 2009 12:54 +1000 Wil Tan<br>
<<a href="mailto:wil@cloudregistry.net">wil@cloudregistry.net</a>> wrote:<br>
<br>
> Though I had concerns before, I agree with you that we should<br>
> just make it PVALID.<br>
><br>
> I still think that the character should be constraint to be<br>
> used within Japanese context only (which means<br>
> Han|Hiragana|Katakana|LDH) and not with any script, but it<br>
> hardly seems worthwhile for us to define a contextual rule for<br>
> it.<br>
<br>
While it can clearly be changed, let's not do this lightly.<br>
Precisely because "Japanese context" includes Romanji, a<br>
registry-applied "no mixed script" rule doesn't prevent<br>
embedding Katakana Middle Dot in an otherwise-all-Latin string.<br>
And, as I (and, if I recall, Ken and others) have said several<br>
times, one cannot rely on the appearance of the "normal" Unicode<br>
glyphs to differentiate characters in cases like this, but most<br>
accept both wide differences in font design and "you see what<br>
you expect to see" user expectations. IMO, we should, in that<br>
regard, pay special attention to the expectations of<br>
undecorated, or mostly-undecorated, Latin-character users<br>
because they have many years of history leading them to expect<br>
that Internet text is, by default, ASCII.<br>
<br>
In addition, things that could be confused with dots should be<br>
treated with special care because we already know that one of<br>
the more popular phishing tricks is to create a URL in which one<br>
popular and likely domain, such as a banking site or popular<br>
storefront, is confused by the user with a more complicated URL<br>
pointing to a malware or identity theft site.<br>
<br>
In summary, I think it is clear that the particular contextual<br>
rule associated with this character should be correct and should<br>
optimally balance having a narrow and precise rule with<br>
complexity. I'm not competent to get that right and am very<br>
pleased that those of you who are competent have taken that task<br>
over. But it seems to me that just making it PVALID is<br>
problematic enough to require a much stronger argument than<br>
we've seen... especially at this late date.<br>
<br>
john<br>
<br>
</div></div><div><div></div><div class="h5">_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div></div></blockquote></div><br>