Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

John C Klensin klensin at jck.com
Fri Aug 7 12:33:45 CEST 2009



--On Friday, August 07, 2009 19:16 +0900 Yoshiro YONEYA
<yone at jprs.co.jp> wrote:

> Dear Patrik,
> 
>> False;
>> For All Characters:
>>     If Script(cp) .in. {Hiragana, Katakana, Han} Then True;
> 
> Please include U+3005..U+3007 into the scripts set because
> they are also  Japanese character family.

U+3005 and U+3007 are identified as "Han" in the Unicode table
Scripts.txt, so need no special treatment.

U+3006 (IDEOGRAPHIC CLOSING MARK) is listed as in "Common"
script in that table.   Without understanding the use of this
character, it is plausible that it would occur in a label that
consisted only of it, the middle dot, and, e.g., Romanji?  If it
is not going to be used except when other ideographic characters
are present, there is no need to make an exception, although a
comment might be in order.  Remember that, as you suggested, the
test now requires only a single character that is unambiguously
Hiragana, Katakana, or Han.

I'd also appreciate comments from those more closely involved
with Unicode as to whether this would be an appropriate
exception and what other consequences that decision might have. 

    john



More information about the Idna-update mailing list