Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Sat Jul 25 10:30:24 CEST 2009

John,

I share your views that this character (and the other middle dot for
that matter) is similar enough to an important protocol character that
it warrants extra care. If we are to go down that road, I'd like to
see:

1. That the contextual rule be validated at lookup time as well. Since
it is easy and attractive (to phishers) to mint DNS labels at lower
level of the tree, not requiring the lookup check does nothing to
mitigate the concerns. This was also raised by Chris (and in a similar
vein, Shawn);

2. That at least one (Han|Hiragana|Katakana) character should come
before the katakana middle dot; and

3. That the label contains only (Han|Hiragana|Katakana|LDH) + middle dot.

However, it makes the rule considerably more complex and because of
this I was thinking more of leaving this to the application, which may
have more contextual information (such as user's locale, the TLD,
etc.) to take appropriate steps to protect the user.

=wil

On Sat, Jul 25, 2009 at 5:44 PM, John C Klensin<klensin at jck.com> wrote:
>
>
> --On Saturday, July 25, 2009 12:54 +1000 Wil Tan
> <wil at cloudregistry.net> wrote:
>
>> Though I had concerns before, I agree with you that we should
>> just make it PVALID.
>>
>> I still think that the character should be constraint to be
>> used within Japanese context only (which means
>> Han|Hiragana|Katakana|LDH) and not with any script, but it
>> hardly seems worthwhile for us to define a contextual rule for
>> it.
>
> While it can clearly be changed, let's not do this lightly.
> Precisely because "Japanese context" includes Romanji, a
> registry-applied "no mixed script" rule doesn't prevent
> embedding Katakana Middle Dot in an otherwise-all-Latin string.
> And, as I (and, if I recall, Ken and others) have said several
> times, one cannot rely on the appearance of the "normal" Unicode
> glyphs to differentiate characters in cases like this, but most
> accept both wide differences in font design and "you see what
> you expect to see" user expectations.   IMO, we should, in that
> regard, pay special attention to the expectations of
> undecorated, or mostly-undecorated, Latin-character users
> because they have many years of history leading them to expect
> that Internet text is, by default, ASCII.
>
> In addition, things that could be confused with dots should be
> treated with special care because we already know that one of
> the more popular phishing tricks is to create a URL in which one
> popular and likely domain, such as a banking site or popular
> storefront, is confused by the user with a more complicated URL
> pointing to a malware or identity theft site.
>
> In summary, I think it is clear that the particular contextual
> rule associated with this character should be correct and should
> optimally balance having a narrow and precise rule with
> complexity.  I'm not competent to get that right and am very
> pleased that those of you who are competent have taken that task
> over.  But it seems to me that just making it PVALID is
> problematic enough to require a much stronger argument than
> we've seen... especially at this late date.
>
>    john
>
>