Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Sat Jul 25 14:34:31 CEST 2009

On Sat, Jul 25, 2009 at 6:52 PM, John C Klensin<klensin at jck.com> wrote:
>
> --On Saturday, July 25, 2009 18:30 +1000 Wil Tan
>> 2. That at least one (Han|Hiragana|Katakana) character should
>> come before the katakana middle dot; and
>>
>> 3. That the label contains only (Han|Hiragana|Katakana|LDH) +
>> middle dot.
>
> Makes sense to me but, again, the boundaries of appropriate
> usage of this character is outside my competence.

I accidentally left out the U+3005..U+3007 that Yoneya-san proposed.
Therefore, #3 should be:

  3. That the label contains only
(Han|Hiragana|Katakana|LDH|U+3005..U+3007) + katakana middle dot.

It's important to note that having these constraints would rule out:

a. planting the katakana middle dot in an all-ASCII label (a
legitimate use-case, but one that Yoneya-san was willing to live
without)

b. katakana middle dot used to concatenate two strings, the first of
which is all-ASCII (arguably more common than above, so disallowing
this may well be unnecessarily restrictive, but is meant to mitigate
phishing concerns)

c. katakana middle dot at the beginning of a label (I don't know how
common is this.)

I don't pretend to know this well, so it'd be great if Yoneya-san and
others who are familiar with this could weigh in.

>> However, it makes the rule considerably more complex and
>> because of this I was thinking more of leaving this to the
>> application, which may have more contextual information (such
>> as user's locale, the TLD, etc.) to take appropriate steps to
>> protect the user.
>
> It is a question, again, of how to draw the line.  In some
> sense, the way that IDNA works makes "in the IDNA protocol" and
> "in the application" different versions of the same thing -- all
> of IDNA occurs "in the application" although API design, etc.,
> may change perceptions of that.   The argument about  how
> normative mapping should be has its mirror image here.
> Personally, I can live with "general prohibition on
> registration; recommend that lookup-side applications be
> extra-careful with this stuff".  That is more or less what
> CONTEXTO is about and would be consistent with what I think you
> are suggesting above (the text in Protocol may need tuning;
> suggested text welcome).     Somewhat more would also work for
> me if we can fairly clearly justify it.
>
> If applications start drawing lines differently on what should
> have been registered, we get the most inconsistent behavior
> possible.  That is why Protocol contains language requiring that
> anything that is Lookup-Valid must be looked up, even if one
> decides to warn the user first.
>

Thanks, this is for me a useful answer to the meta question of "where
to draw the line". Still, it is a difficult balancing act to juggle
between having rules that are simple enough to implement and yet tight
enough to prevent confusion with a view to allow legitimate usage.

In order to frame my head to make sense of the recent contextual rules
discussions, I'm trying to picture the "guidelines" for deciding how
to craft the rules. Is the following reasonable?

1. Overview: describe the expected contexts in which the subject
character is allowed to appear.
2. Rule set: if the contexts are simple and narrow enough to capture,
they can be expressed in the pseudocode. For other more complex ones,
a simple check may suffice leaving the registry to place additional
constraints on the subject character. It may not have to capture
everything in the overview.

This seems to be consistent with recommendations given elsewhere by
Mark and others.

Applying those "guidelines" to the katakana middle dot, the overview would be:

  This character is used in Japanese orthography to concatenate strings
  containing characters in the Hiragana, Katakana and Han scripts,
  or in any of the following sets: [a-z0-9\-], U+3005..U+3009.

and the rule set would probably just be "True", because the rule won't
be simple, and certainly will not be narrow so trying to capture it in
pseudocode would be pointless. However, having the rule is better than
not having any at all because it does flag the character to the
registry that it needs policy around it, as well as tell the
application to be extra careful about it.

> Be careful, however, about any assumptions of actions based  on
> TLDs.   The presence of DNAME and the lack of any "give me back
> the canonical/primary tree" function in the DNS makes that one
> very fragile even if there were no other issues.
>

Point taken.

Thanks.

=wil