Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)
John C Klensin
klensin at jck.com
Fri Aug 7 21:57:57 CEST 2009
This is extremely helpful, as usual. Many thanks.
--On Friday, August 07, 2009 12:35 -0700 Kenneth Whistler
<kenw at sybase.com> wrote:
> O.k., it looks like I have to wade in on this thread now. :-)
> John said:
>> If this is really a symbol, punctuation, or spacing mark --as
>> the name implies-- then our general principles would argue for
>> banning it entirely.
> O.k., first let's get this misconception off the table.
> The IDEOGRAPHIC CLOSING MARK has *nothing* whatsoever to
> do with punctuation. This isn't "CLOSING" in the sense of
> "closing punctuation" or anything of the sort.
> U+3006 IDEOGRAPHIC CLOSING MARK is an abbreviated form that
> Japanese shopkeepers hang up on their doors to indicate
> the shop is closed. It is literally read "shime", which
> means 'closed (not open for business)', from the verb
> "shimeru" 'to close'. It is basically the Japanese equivalent
> of this:
> When Yoneya-san talks about this "shime" being equivalent
> to the CJK ideograph U+7DE0, it isn't that U+7DE0 is
> a *character* equivalent to U+3006 per se, but rather that
> U+7DE0 is the ordinary kanji used to write the verb
> "shime(ru)" (or "shima(ru)") -- in actual writing U+7DE0 is
> used just for the "shi" root part of "shimeru", and you
> would follow it be U+3081 to write the Hiragana
> syllable "me". And a shopkeeper might post a sign that
> has just U+7DE0 as another way to indicate a shop is closed.
>> Unless someone makes the case for its
>> having been misclassified, I don't see a reason to make an
>> exception to Unicode's classification of it as "Lo", so it
>> would remain a PVALID character.
> It isn't misclassified. In origin, U+3006 is a handwriting
> abbreviation for "shime", so it has something in common
> with other digraphic abbreviatory forms like the more
> recently encoded U+309F HIRAGANA DIGRAPH YORI.
> U+3006 has the additional attribute that it has long been
> treated as a kind of honorary ideograph, because it stands
> for the verb "shime(ru)" in the same way that the actual,
> traditional, correct CJK ideograph U+7DE0 does. And because
> of its use as a "content" element, it is classed in the
> UCD as General_Category=Lo, but it is also classed as
> The reason why U+3006 is given Script=Common, instead of
> Script=Han, is that it is in origin a derivative of
> Hiragana forms, but isn't formally Hiragana, nor is it
> formally a CJK Ideograph. Think of it as being a kind
> of letterlike symbol, but one which is used in context
> of Han, Hiragana, and Katakana in the Japanese writing
> system, like a number of other letterlike symbols or
> actual symbol-symbols in the 30XX blocks in Unicode.
>> But, just as was the case for
>> Middle Dot, I think we need to hear a compelling argument for
>> why it is actually necessary to have labels that consist only
>> of one or more closing marks and middle dots.
> On that point, I would differ somewhat with Yoneya-san on
> whether there is anything compelling about this.
>> At least for me, it would help to know how a label consisting
>> of Ãœ+3006 U+30FB
>> would be pronounced and what it would mean.
> It would be pronounced "shime", but that is somewhat beside
> the point.
>> It would also help me to understand how a normal (not computer
>> expert) reader of Japanese would read
>> U+30A2 U+30AA U+30FB U+30A2
>> as different from
>> U+30A2 U+30AA U+30FB U+30A2 U+3006
>> in a label.
> both of which are nonsensical, of course.
> It would be possible to make a case for just U+3006 all by
> itself in a label, although odd -- the way someone has
> registered and used the radical sign U+227A as a label, and
> actually has a website up for it. Since U+3006 is PVALID and
> otherwise unconstrained, that is allowed by IDNA2008 currently.
> I don't see any strong case for U+3006 *and* U+30FB without
> any other Han or Hiragana characters. It just wouldn't mean
> much. The U+30FB is a little like adding a "-", and unless
> you connect it to something meaningful, there isn't much
> point to it.
>> Otherwise, I think that the observation that Harald and I have
>> made in different ways should probably apply: It is in
>> everyone's interest to minimize the number of exceptions to
>> those that are really needed to support the writing system.
>> In this example, I believe that case has been made for
>> permitting Katakana Middle Dot despite the fact that it is
>> classified as punctuation. But the idea of making a second
>> exception just to support an exception makes me very nervous,
>> especially if the argument for it is that someone might
>> desire such a label.
> My opinion is that it isn't worth making an exception to
> the exception for U+30FB, just to allow it to work with
> U+3006 "shime" alone. That is an edge case of an edge case, and
> I cannot envision a strong enough claim on its necessity to
> justify yet another exception to the rule. Anybody who wanted
> to use U+3006 in a label with the katakana middle dot could
> make it work by simply adding one more Japanese character --
> virtually any other Japanese character would work.
More information about the Idna-update