Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)
eblanconil at gmail.com
Fri Aug 7 22:13:39 CEST 2009
God bless Japan!
How many languages, scripts and orthotypographies round the world have
no Kenneth to speak-up on their behalf and will suffer from the IETF
IDNA layer violation. Or most probably simply disregard it. How easier
would it have been to organise a TLD Table inheritance system and
leave the top zone managers decide about the permitted code points or
Was it not at a time a proposition by John Klensin?
Ooops! This would have been a problem for "internationalized" money
making gTLDs ICANN wants to sell. But if I understand correctly
Congress people want to address the Internet for the Rich problem as
well. And turn ICANN a permanent US Agency.
I may seem out of topic. But I am not sure I am. A good architecture
should be independent from business fashions. As Jefsey says the
problem is that IDNA is not even considered by IAB in RFC 3869 as a
matter for development priority. IMHO the real issue is to make sure
in the final wording that Class, TLD, presentation, or Zone related
character restrictions or exceptions can be documented without
contradicting the proposed text - in order to avoid unnecessary
2009/8/7 Kenneth Whistler <kenw at sybase.com>:
> O.k., it looks like I have to wade in on this thread now. :-)
> John said:
>> If this is really a symbol, punctuation, or spacing mark --as
>> the name implies-- then our general principles would argue for
>> banning it entirely.
> O.k., first let's get this misconception off the table.
> The IDEOGRAPHIC CLOSING MARK has *nothing* whatsoever to
> do with punctuation. This isn't "CLOSING" in the sense of
> "closing punctuation" or anything of the sort.
> U+3006 IDEOGRAPHIC CLOSING MARK is an abbreviated form that
> Japanese shopkeepers hang up on their doors to indicate
> the shop is closed. It is literally read "shime", which
> means 'closed (not open for business)', from the verb
> "shimeru" 'to close'. It is basically the Japanese equivalent
> of this:
> When Yoneya-san talks about this "shime" being equivalent
> to the CJK ideograph U+7DE0, it isn't that U+7DE0 is
> a *character* equivalent to U+3006 per se, but rather that
> U+7DE0 is the ordinary kanji used to write the verb
> "shime(ru)" (or "shima(ru)") -- in actual writing U+7DE0 is
> used just for the "shi" root part of "shimeru", and you
> would follow it be U+3081 to write the Hiragana
> syllable "me". And a shopkeeper might post a sign that
> has just U+7DE0 as another way to indicate a shop is closed.
>> Unless someone makes the case for its
>> having been misclassified, I don't see a reason to make an
>> exception to Unicode's classification of it as "Lo", so it would
>> remain a PVALID character.
> It isn't misclassified. In origin, U+3006 is a handwriting
> abbreviation for "shime", so it has something in common
> with other digraphic abbreviatory forms like the more
> recently encoded U+309F HIRAGANA DIGRAPH YORI.
> U+3006 has the additional attribute that it has long been
> treated as a kind of honorary ideograph, because it stands
> for the verb "shime(ru)" in the same way that the actual,
> traditional, correct CJK ideograph U+7DE0 does. And because
> of its use as a "content" element, it is classed in the
> UCD as General_Category=Lo, but it is also classed as
> The reason why U+3006 is given Script=Common, instead of
> Script=Han, is that it is in origin a derivative of
> Hiragana forms, but isn't formally Hiragana, nor is it
> formally a CJK Ideograph. Think of it as being a kind
> of letterlike symbol, but one which is used in context
> of Han, Hiragana, and Katakana in the Japanese writing
> system, like a number of other letterlike symbols or
> actual symbol-symbols in the 30XX blocks in Unicode.
>> But, just as was the case for
>> Middle Dot, I think we need to hear a compelling argument for
>> why it is actually necessary to have labels that consist only of
>> one or more closing marks and middle dots.
> On that point, I would differ somewhat with Yoneya-san on
> whether there is anything compelling about this.
>> At least for me, it would help to know how a label consisting of
>> Ãœ+3006 U+30FB
>> would be pronounced and what it would mean.
> It would be pronounced "shime", but that is somewhat beside
> the point.
>> It would also help me to understand how a normal (not computer
>> expert) reader of Japanese would read
>> U+30A2 U+30AA U+30FB U+30A2
>> as different from
>> U+30A2 U+30AA U+30FB U+30A2 U+3006
>> in a label.
> both of which are nonsensical, of course.
> It would be possible to make a case for just U+3006 all by
> itself in a label, although odd -- the way someone has registered
> and used the radical sign U+227A as a label, and actually has
> a website up for it. Since U+3006 is PVALID and otherwise
> unconstrained, that is allowed by IDNA2008 currently.
> I don't see any strong case for U+3006 *and* U+30FB without
> any other Han or Hiragana characters. It just wouldn't mean
> much. The U+30FB is a little like adding a "-", and unless
> you connect it to something meaningful, there isn't much
> point to it.
>> Otherwise, I think that the observation that Harald and I have
>> made in different ways should probably apply: It is in
>> everyone's interest to minimize the number of exceptions to
>> those that are really needed to support the writing system. In
>> this example, I believe that case has been made for permitting
>> Katakana Middle Dot despite the fact that it is classified as
>> punctuation. But the idea of making a second exception just to
>> support an exception makes me very nervous, especially if the
>> argument for it is that someone might desire such a label.
> My opinion is that it isn't worth making an exception to
> the exception for U+30FB, just to allow it to work with
> U+3006 "shime" alone. That is an edge case of an edge case, and
> I cannot envision a strong enough claim on its necessity to
> justify yet another exception to the rule. Anybody who wanted
> to use U+3006 in a label with the katakana middle dot could
> make it work by simply adding one more Japanese character --
> virtually any other Japanese character would work.
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update