Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Fri Jul 24 18:23:19 CEST 2009

There isn't any confusability issue, because the width and placement are so
different. There are many PVALID characters that are far more confusable
with syntax characters than the KATAKANA MIDDLE DOT. Take a look at the
attached image, for example.

Which of the first two is more confusable with a period (Row 3)? The
KATAKANA MIDDLE DOT (Row 1) is clearly different from the period; it is the
Arabic ZERO (Row 2) -- *which is PVALID* -- that is far more
confusable.There is no reason for KATAKANA MIDDLE DOT to be CONTEXO:
it should just be
made PVALID, and be done with it rather than have all of this last minute
jiggering.

Mark

On Fri, Jul 24, 2009 at 08:15, Wil Tan <wil at cloudregistry.net> wrote:

> Thanks for bringing this up, Kenneth. I agree that this definitely needs
> fixing.
>
> On Fri, Jul 24, 2009 at 10:21 AM, Kenneth Whistler <kenw at sybase.com>
> wrote:
> > A.9. KATAKANA MIDDLE DOT
> >
> > On this one, there is a long thread from April 3 - 7
> > entitled "Tables and contextual rule for Katakana middle dot"
> > that started with John Klensin's observation:
> >
> > <quote>
> > Just so this doesn't accidentally fall through the cracks...
> >
> > It is clear from the discussion last week that I simply got the
> > contextual rule for Katakana Middle Dot (U+30FB) wrong in what
> > is rule/Appendix A.12 in Tables-05.  I had understood that I had
> > been told it was used only with Katakana; the JET I-D and
> > Monday's presentation make it clear to me (and I assume others)
> > that it can be used between any pair of Japanese characters.
> > The overview now reads:
> >
> >   Adjacent characters MUST be Katakana.
> >
> > It should be:
> >
> >   Adjacent characters MUST be Hiragana, Katakana, or Han.
> >
> > The associated Rule Set will, of course, have to be updated to
> > match.
> > </quote>
> >
> > I concur with that general assessment, although Yoneya-san
> > noted that it the Katakana middle dot also occurs in
> > other (Japanese) contexts, including before or after
> > ([a-zA-Z0-9]). Yoneya-san's assessment was:
> >
> >    (KATAKANA MIDDLEDOT) MUST be used in Japanese context.
> >
> > And the thread then foundered and moved on to other topics,
> > because nobody really knows how to specify that in a rule.
> >
>
> You might have missed the latest recommendation from Yoneya-san dated April
> 8th:
>
> <quote>
> Excluding Alphabet and digit causes somewhat implications to existing
> registration,
> but I couldn't find legitimate explanation for including them as
> Japanese context.
> How to deal with the implications is decision of registries.
>
> Appendix A.12.  KATAKANA MIDDLE DOT
>  Code point:
>     U+30FB
>  Overview:
>     MUST be used with at least one Han, Hiragana or Katakana.
>  Lookup:
>     False
>  Rule Set:
>     False;
>     For All Characters:
>       If Script(cp) .eq. ( Han | Hiragana | Katakana ) Then True;
>       If cp .in. U+3005..U+3007 Then True;
>     End For;
> </quote>
>
> To which John and Paul said "works for me" before the thread went off
> topic.
>
>
> > At any rate I would like to reiterate that this should not fall through
> > the cracks, and the Overview and Rule Set for A.9. still need
> > updating.
>
> Agreed.
>
> > The options are:
> >
> > 1. Update as John Klensin suggested.
> >
> > 2. Add ([a-zA-Z0-9]) to the allowed contexts, to get closer
> >   to Japanese usage.
> >
> > 3. Give up on attempting to write a formal Rule Set for
> >   "MUST be used in Japanese context", make the character
> >   PVALID instead of CONTEXTO in the Exceptions list,
> >   and leave it up to registrars to allow or disallow for
> >   country-specific registrations.
> >
>
> So there is another option:
>
> 4. Update as Yoneya-san suggested (quoted above).
>
> I do share the concerns of Harald, John and Vint that this is after
> all a punctuation character, and one that is potentially confusable
> with an important protocol character. On the other hand, I also
> appreciate its use in the Japanese orthography, and there are
> presumably lots of names with that character already registered and in
> use so breaking that compatibility would be quite detrimental (though
> I presume the mapping draft would take care of it.)
>
> As such, my personal take would be to adopt #4, but tighten it
> further. As it is proposed, #4 allows the katakana middle dot if the
> label contains any of (hiragana|katakana|han|U+3005|U+3006|U+3007).
>
> If the potential for visual confusion is of any concern at all, it
> seems that we should require at least one Hiragana|Katakana|Han
> character appearing before the middle dot. Also, it should not be
> possible to just have one of U+3005..U+3007 e.g. "www・〇" (that's a
> katakana middle dot followed by U+3007.)
>
> I don't know enough Japanese to list the use cases for it, but do know
> that one common use case is in "spelling out" a string of alphabets.
> For example, the registered company name for NTT Communications in
> Japan is:
>  エヌ・ティ・ティ・コミュニケーションズ
> which is really the transliteration of the English string "N-T-T
> Communications".
>
> All that said, I'm really on the fence and would like to hear from
> others on the list.
>
> > I don't think we have the option to leave the A.9. Rule Set
> > as it is currently stated, as that is not even minimally
> > acceptable in a Japanese context.
> >
>
> Agreed.
>
> =wil
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090724/47705031/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dot picture.tiff
Type: image/tiff
Size: 3902 bytes
Desc: not available
Url : http://www.alvestrand.no/pipermail/idna-update/attachments/20090724/47705031/attachment.tiff