Tables and contextual rule for Katakana middle dot
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Wed Apr 8 06:43:45 CEST 2009
First, I'm sympathetic to the fact that punctuation should be excluded
in general. But I think it's not that easy. There is a continuum between
characters such as the "." (a period in punctuation, not really
necessary in a label), "'" (an apostrophe in some contexts, part
of many English words, for which we got accustomed to not have
it in domain names, but for which I guess there might be quite
a few people rooting if domain names for English weren't a done
business), and characters closer to letters than the English apostrophe.
Second, on the visual confusability of ・(middle dot), I'm personally
not too worried. And I think visual confusability in handwriting
the way John has described it isn't really what we should check for.
I'm sure everybody would read Latin characters with dot-like stuff
between them as dots in first approximation, and only with middle dot
if the handwriting was very careful. That a Latin-oriented OCR software
gets things wrong isn't surprising either, they get a lot of things
wrong, and they definitely can't recognize characters they are not
Also, I'm not worried about Japanese being "famous for artistic
calligraphy and font design". There are a lot of fancy fonts for
Japanese, but way less than for European scripts (the large number of
characters just makes it much more expensive to create a font), and
exactly the same way as for European scripts, such fonts are not
customarily used when displaying domain names. And even in these fonts,
there is not too much artistry going on with dots and middle dots.
I have written to a group of Japanese typography experts, authors of
http://www.w3.org/TR/2008/WD-jlreq-20081015/ and many of them also
having been involved in JIS 4051, and asked them for feedback from a
typographic view on the context of middle dot. I'll relay whatever I get
from them here.
If I had to decide now, I would conclude that the middle dot can be
allowed in the protocol, and that only registries such as the Japanese
one that thinks it's needed for their users should allow it. But I would
also be okay with permitting the middle dot only in contexts where there
is a Kanji, Hiragana, or Katakana at least on one side. In my eye,
having middle dots between Latin characters simply happens in practice
because the middle dot is available, but can easily be replaced by the
hyphen, which is typographically more appropriate for Latin.
On 2009/04/07 22:35, Harald Alvestrand wrote:
> Yoshiro YONEYA wrote:
>> Dear Patrik-san,
>> Japanese uses Hiragana, Katakana, Han, Alphabet letters (a-z), and
>> digit (0-9) for names. KATAKANA MIDDLEDOT is usually used with those
>> names, so the following kind of case is really exists and used:
>> Play<KATAKANA MIDDLEDOT>Station<KATAKANA MIDDLEDOT>4.jp
>> That is the reason why I said "Japanese context".
>> To be precise, Japanese scripts (for IDN) are consists from:
>> Hiragana, Katakana, Han, Alphabet, Digit,
>> IDEOGRAPHIC CLOSING MARK, IDEOGRAPHIC NUMBER ZERO,
>> KATAKANA MIDDLEDOT and IDEOGRAPHIC ITERATION MARK
>> Extracting Alphabet and Digit from the list is unacceptable.
>> I'll try to express this ambiguous situation more clearly.
> Speaking with sadness:
> If this is the case, I think we will have to declare KATAKANA MIDDLE DOT
> to have the same status as the apostrophe: Not permitted.
> Idna-update mailing list
> Idna-update at alvestrand.no
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update