Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Wil Tan wil at cloudregistry.net
Fri Jul 24 17:15:10 CEST 2009


Thanks for bringing this up, Kenneth. I agree that this definitely needs fixing.

On Fri, Jul 24, 2009 at 10:21 AM, Kenneth Whistler <kenw at sybase.com> wrote:
> A.9. KATAKANA MIDDLE DOT
>
> On this one, there is a long thread from April 3 - 7
> entitled "Tables and contextual rule for Katakana middle dot"
> that started with John Klensin's observation:
>
> <quote>
> Just so this doesn't accidentally fall through the cracks...
>
> It is clear from the discussion last week that I simply got the
> contextual rule for Katakana Middle Dot (U+30FB) wrong in what
> is rule/Appendix A.12 in Tables-05.  I had understood that I had
> been told it was used only with Katakana; the JET I-D and
> Monday's presentation make it clear to me (and I assume others)
> that it can be used between any pair of Japanese characters.
> The overview now reads:
>
>   Adjacent characters MUST be Katakana.
>
> It should be:
>
>   Adjacent characters MUST be Hiragana, Katakana, or Han.
>
> The associated Rule Set will, of course, have to be updated to
> match.
> </quote>
>
> I concur with that general assessment, although Yoneya-san
> noted that it the Katakana middle dot also occurs in
> other (Japanese) contexts, including before or after
> ([a-zA-Z0-9]). Yoneya-san's assessment was:
>
>    (KATAKANA MIDDLEDOT) MUST be used in Japanese context.
>
> And the thread then foundered and moved on to other topics,
> because nobody really knows how to specify that in a rule.
>

You might have missed the latest recommendation from Yoneya-san dated April 8th:

<quote>
Excluding Alphabet and digit causes somewhat implications to existing
registration,
but I couldn't find legitimate explanation for including them as
Japanese context.
How to deal with the implications is decision of registries.

Appendix A.12.  KATAKANA MIDDLE DOT
  Code point:
     U+30FB
  Overview:
     MUST be used with at least one Han, Hiragana or Katakana.
  Lookup:
     False
  Rule Set:
     False;
     For All Characters:
       If Script(cp) .eq. ( Han | Hiragana | Katakana ) Then True;
       If cp .in. U+3005..U+3007 Then True;
     End For;
</quote>

To which John and Paul said "works for me" before the thread went off topic.


> At any rate I would like to reiterate that this should not fall through
> the cracks, and the Overview and Rule Set for A.9. still need
> updating.

Agreed.

> The options are:
>
> 1. Update as John Klensin suggested.
>
> 2. Add ([a-zA-Z0-9]) to the allowed contexts, to get closer
>   to Japanese usage.
>
> 3. Give up on attempting to write a formal Rule Set for
>   "MUST be used in Japanese context", make the character
>   PVALID instead of CONTEXTO in the Exceptions list,
>   and leave it up to registrars to allow or disallow for
>   country-specific registrations.
>

So there is another option:

4. Update as Yoneya-san suggested (quoted above).

I do share the concerns of Harald, John and Vint that this is after
all a punctuation character, and one that is potentially confusable
with an important protocol character. On the other hand, I also
appreciate its use in the Japanese orthography, and there are
presumably lots of names with that character already registered and in
use so breaking that compatibility would be quite detrimental (though
I presume the mapping draft would take care of it.)

As such, my personal take would be to adopt #4, but tighten it
further. As it is proposed, #4 allows the katakana middle dot if the
label contains any of (hiragana|katakana|han|U+3005|U+3006|U+3007).

If the potential for visual confusion is of any concern at all, it
seems that we should require at least one Hiragana|Katakana|Han
character appearing before the middle dot. Also, it should not be
possible to just have one of U+3005..U+3007 e.g. "www・〇" (that's a
katakana middle dot followed by U+3007.)

I don't know enough Japanese to list the use cases for it, but do know
that one common use case is in "spelling out" a string of alphabets.
For example, the registered company name for NTT Communications in
Japan is:
  エヌ・ティ・ティ・コミュニケーションズ
which is really the transliteration of the English string "N-T-T
Communications".

All that said, I'm really on the fence and would like to hear from
others on the list.

> I don't think we have the option to leave the A.9. Rule Set
> as it is currently stated, as that is not even minimally
> acceptable in a Japanese context.
>

Agreed.

=wil


More information about the Idna-update mailing list