Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Vint Cerf vint at google.com
Fri Aug 7 22:00:28 CEST 2009


thanks - i would read this as recommending to treat U+3006
(IDEOGRAPHIC CLOSING MARK) as a pvalid character
that can be used with other pvalid characters and that it
can be used as an enabler for the use of the Katakana
Middle Dot (U+30FB).

Have I correctly understood your intent?


On Aug 7, 2009, at 3:35 PM, Kenneth Whistler wrote:

> O.k., it looks like I have to wade in on this thread now. :-)
> John said:
>> If this is really a symbol, punctuation, or spacing mark --as
>> the name implies-- then our general principles would argue for
>> banning it entirely.
> O.k., first let's get this misconception off the table.
> The IDEOGRAPHIC CLOSING MARK has *nothing* whatsoever to
> do with punctuation. This isn't "CLOSING" in the sense of
> "closing punctuation" or anything of the sort.
> U+3006 IDEOGRAPHIC CLOSING MARK is an abbreviated form that
> Japanese shopkeepers hang up on their doors to indicate
> the shop is closed. It is literally read "shime", which
> means 'closed (not open for business)', from the verb
> "shimeru" 'to close'. It is basically the Japanese equivalent
> of this:
> http://www.nottsprepared.gov.uk/np_home/closed_sign2.jpg
> When Yoneya-san talks about this "shime" being equivalent
> to the CJK ideograph U+7DE0, it isn't that U+7DE0 is
> a *character* equivalent to U+3006 per se, but rather that
> U+7DE0 is the ordinary kanji used to write the verb
> "shime(ru)" (or "shima(ru)") -- in actual writing U+7DE0 is
> used just for the "shi" root part of "shimeru", and you
> would follow it be U+3081 to write the Hiragana
> syllable "me". And a shopkeeper might post a sign that
> has just U+7DE0 as another way to indicate a shop is closed.
>> Unless someone makes the case for its
>> having been misclassified, I don't see a reason to make an
>> exception to Unicode's classification of it as "Lo", so it would
>> remain a PVALID character.
> It isn't misclassified. In origin, U+3006 is a handwriting
> abbreviation for "shime", so it has something in common
> with other digraphic abbreviatory forms like the more
> recently encoded U+309F HIRAGANA DIGRAPH YORI.
> U+3006 has the additional attribute that it has long been
> treated as a kind of honorary ideograph, because it stands
> for the verb "shime(ru)" in the same way that the actual,
> traditional, correct CJK ideograph U+7DE0 does. And because
> of its use as a "content" element, it is classed in the
> UCD as General_Category=Lo, but it is also classed as
> Ideographic=True.
> The reason why U+3006 is given Script=Common, instead of
> Script=Han, is that it is in origin a derivative of
> Hiragana forms, but isn't formally Hiragana, nor is it
> formally a CJK Ideograph. Think of it as being a kind
> of letterlike symbol, but one which is used in context
> of Han, Hiragana, and Katakana in the Japanese writing
> system, like a number of other letterlike symbols or
> actual symbol-symbols in the 30XX blocks in Unicode.
>> But, just as was the case for
>> Middle Dot, I think we need to hear a compelling argument for
>> why it is actually necessary to have labels that consist only of
>> one or more closing marks and middle dots.
> On that point, I would differ somewhat with Yoneya-san on
> whether there is anything compelling about this.
>> At least for me, it would help to know how a label consisting of
>>   Ü+3006 U+30FB
>> would be pronounced and what it would mean.
> It would be pronounced "shime", but that is somewhat beside
> the point.
>> It would also help me to understand how a normal (not computer
>> expert) reader of Japanese would read
>>  U+30A2 U+30AA U+30FB U+30A2
> ao-a
>> as different from
>>  U+30A2 U+30AA U+30FB U+30A2 U+3006
> ao-ashime
>> in a label.
> both of which are nonsensical, of course.
> It would be possible to make a case for just U+3006 all by
> itself in a label, although odd -- the way someone has registered
> and used the radical sign U+227A as a label, and actually has
> a website up for it. Since U+3006 is PVALID and otherwise
> unconstrained, that is allowed by IDNA2008 currently.
> I don't see any strong case for U+3006 *and* U+30FB without
> any other Han or Hiragana characters. It just wouldn't mean
> much. The U+30FB is a little like adding a "-", and unless
> you connect it to something meaningful, there isn't much
> point to it.
>> Otherwise, I think that the observation that Harald and I have
>> made in different ways should probably apply: It is in
>> everyone's interest to minimize the number of exceptions to
>> those that are really needed to support the writing system.  In
>> this example, I believe that case has been made for permitting
>> Katakana Middle Dot despite the fact that it is classified as
>> punctuation.  But the idea of making a second exception just to
>> support an exception makes me very nervous, especially if the
>> argument for it is that someone might desire such a label.
> My opinion is that it isn't worth making an exception to
> the exception for U+30FB, just to allow it to work with
> U+3006 "shime" alone. That is an edge case of an edge case, and
> I cannot envision a strong enough claim on its necessity to
> justify yet another exception to the rule. Anybody who wanted
> to use U+3006 in a label with the katakana middle dot could
> make it work by simply adding one more Japanese character --
> virtually any other Japanese character would work.
> --Ken
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list