Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Wil Tan wil at cloudregistry.net
Sun Jul 26 09:04:17 CEST 2009


I'm fine with this, though I'd prefer "dirt simple" to be a plain "True".
The advantage over (1) is that it allows room for explanation and warning to
registries and developers, and over your proposed algorithm is that it
doesn't prohibit labels that otherwise contain all Latin characters
(decorated or not.)

=wil

On Sun, Jul 26, 2009 at 2:44 AM, Mark Davis ⌛ <mark at macchiato.com> wrote:

> I agree with Shawn. There are some perfectly reasonable options, but #3 is
> not one of them.
>
>    1. Make it PVALID
>    2. Make it CONTEXTO but dirt simple.
>    3. Wrangle about complicated conditions pointlessly, when there are so,
>    so many more characters that cause far worse problems.
>
> A dirt simple approach is to just make sure that there is at least one CJK
> character someplace in the label.
>
> KATAKANA MIDDLE DOT:
>
> Rule Set:
>      False;
>      For All Characters C:
>         If Script(C) .in. {Hiragana, Katakana, Han}
>         .or. Block(C) .eq. {CJK_Symbols_And_Punctuation} Then True;
>      End For;
>
> or in regex:
>
> if (label.find("\u30FB")
>   && !label.find("[\p{script=Hiragana|Katakana|Han}\p{block=
> CJK_Symbols_And_Punctuation}]") {
>     return false;
> }
> Mark
>
>
>
> On Sat, Jul 25, 2009 at 00:56, Shawn Steele <Shawn.Steele at microsoft.com>wrote:
>
>> I'm quite concerned about the effort put into a few code points to "solve"
>> homograph issues when it does pretty much nothing to solve phishing.  It's
>> like locking the windows and keeping the front door wide open.
>>
>> Phishers aren't going to let a little thing like a middle dot stop them,
>> it's just as easy, or easier, to just register "paypal.safest.com" and go
>> from there.  I'm redirected to meaningless URLs all the time for legitimate
>> order completion service providers.  I'd probably be more suspicious about a
>> "floating" period than some of the URLs I have seen.
>>
>> What this does accomplish is making the spec harder to figure out and
>> conform to, while only partially solving corner cases of the security
>> problem.
>>
>> -Shawn
>>
>> ________________________________________
>> From: idna-update-bounces at alvestrand.no [
>> idna-update-bounces at alvestrand.no] on behalf of John C Klensin [
>> klensin at jck.com]
>> Sent: Saturday, July 25, 2009 12:44 AM
>> To: Wil Tan; Mark Davis ⌛
>> Cc: Patrik Fältström; IDNA update work; Kenneth Whistler
>> Subject: Re: Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6,
>> A.9)
>>
>> --On Saturday, July 25, 2009 12:54 +1000 Wil Tan
>> <wil at cloudregistry.net> wrote:
>>
>> > Though I had concerns before, I agree with you that we should
>> > just make it PVALID.
>> >
>> > I still think that the character should be constraint to be
>> > used within Japanese context only (which means
>> > Han|Hiragana|Katakana|LDH) and not with any script, but it
>> > hardly seems worthwhile for us to define a contextual rule for
>> > it.
>>
>> While it can clearly be changed, let's not do this lightly.
>> Precisely because "Japanese context" includes Romanji, a
>> registry-applied "no mixed script" rule doesn't prevent
>> embedding Katakana Middle Dot in an otherwise-all-Latin string.
>> And, as I (and, if I recall, Ken and others) have said several
>> times, one cannot rely on the appearance of the "normal" Unicode
>> glyphs to differentiate characters in cases like this, but most
>> accept both wide differences in font design and "you see what
>> you expect to see" user expectations.   IMO, we should, in that
>> regard, pay special attention to the expectations of
>> undecorated, or mostly-undecorated, Latin-character users
>> because they have many years of history leading them to expect
>> that Internet text is, by default, ASCII.
>>
>> In addition, things that could be confused with dots should be
>> treated with special care because we already know that one of
>> the more popular phishing tricks is to create a URL in which one
>> popular and likely domain, such as a banking site or popular
>> storefront, is confused by the user with a more complicated URL
>> pointing to a malware or identity theft site.
>>
>> In summary, I think it is clear that the particular contextual
>> rule associated with this character should be correct and should
>> optimally balance having a narrow and precise rule with
>> complexity.  I'm not competent to get that right and am very
>> pleased that those of you who are competent have taken that task
>> over.  But it seems to me that just making it PVALID is
>> problematic enough to require a much stronger argument than
>> we've seen... especially at this late date.
>>
>>    john
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090726/e733044a/attachment.htm 


More information about the Idna-update mailing list