Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Mark Davis ⌛ mark at macchiato.com
Sat Jul 25 18:44:07 CEST 2009


I agree with Shawn. There are some perfectly reasonable options, but #3 is
not one of them.

   1. Make it PVALID
   2. Make it CONTEXTO but dirt simple.
   3. Wrangle about complicated conditions pointlessly, when there are so,
   so many more characters that cause far worse problems.

A dirt simple approach is to just make sure that there is at least one CJK
character someplace in the label.

KATAKANA MIDDLE DOT:

Rule Set:
     False;
     For All Characters C:
        If Script(C) .in. {Hiragana, Katakana, Han}
        .or. Block(C) .eq. {CJK_Symbols_And_Punctuation} Then True;
     End For;

or in regex:

if (label.find("\u30FB")
  && !label.find("[\p{script=Hiragana|Katakana|Han}\p{block=
CJK_Symbols_And_Punctuation}]") {
    return false;
}
Mark


On Sat, Jul 25, 2009 at 00:56, Shawn Steele <Shawn.Steele at microsoft.com>wrote:

> I'm quite concerned about the effort put into a few code points to "solve"
> homograph issues when it does pretty much nothing to solve phishing.  It's
> like locking the windows and keeping the front door wide open.
>
> Phishers aren't going to let a little thing like a middle dot stop them,
> it's just as easy, or easier, to just register "paypal.safest.com" and go
> from there.  I'm redirected to meaningless URLs all the time for legitimate
> order completion service providers.  I'd probably be more suspicious about a
> "floating" period than some of the URLs I have seen.
>
> What this does accomplish is making the spec harder to figure out and
> conform to, while only partially solving corner cases of the security
> problem.
>
> -Shawn
>
> ________________________________________
> From: idna-update-bounces at alvestrand.no [idna-update-bounces at alvestrand.no]
> on behalf of John C Klensin [klensin at jck.com]
> Sent: Saturday, July 25, 2009 12:44 AM
> To: Wil Tan; Mark Davis ⌛
> Cc: Patrik Fältström; IDNA update work; Kenneth Whistler
> Subject: Re: Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)
>
> --On Saturday, July 25, 2009 12:54 +1000 Wil Tan
> <wil at cloudregistry.net> wrote:
>
> > Though I had concerns before, I agree with you that we should
> > just make it PVALID.
> >
> > I still think that the character should be constraint to be
> > used within Japanese context only (which means
> > Han|Hiragana|Katakana|LDH) and not with any script, but it
> > hardly seems worthwhile for us to define a contextual rule for
> > it.
>
> While it can clearly be changed, let's not do this lightly.
> Precisely because "Japanese context" includes Romanji, a
> registry-applied "no mixed script" rule doesn't prevent
> embedding Katakana Middle Dot in an otherwise-all-Latin string.
> And, as I (and, if I recall, Ken and others) have said several
> times, one cannot rely on the appearance of the "normal" Unicode
> glyphs to differentiate characters in cases like this, but most
> accept both wide differences in font design and "you see what
> you expect to see" user expectations.   IMO, we should, in that
> regard, pay special attention to the expectations of
> undecorated, or mostly-undecorated, Latin-character users
> because they have many years of history leading them to expect
> that Internet text is, by default, ASCII.
>
> In addition, things that could be confused with dots should be
> treated with special care because we already know that one of
> the more popular phishing tricks is to create a URL in which one
> popular and likely domain, such as a banking site or popular
> storefront, is confused by the user with a more complicated URL
> pointing to a malware or identity theft site.
>
> In summary, I think it is clear that the particular contextual
> rule associated with this character should be correct and should
> optimally balance having a narrow and precise rule with
> complexity.  I'm not competent to get that right and am very
> pleased that those of you who are competent have taken that task
> over.  But it seems to me that just making it PVALID is
> problematic enough to require a much stronger argument than
> we've seen... especially at this late date.
>
>    john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090725/803980ad/attachment.htm 


More information about the Idna-update mailing list