Katakana Middle Dot again (Was: tables-06b.txt: A.5, A.6, A.9)

Harald Tveit Alvestrand harald at alvestrand.no
Fri Aug 7 13:29:10 CEST 2009


Yoshiro YONEYA skrev:
> Dear John,
>
>   
>> U+3005 and U+3007 are identified as "Han" in the Unicode table
>> Scripts.txt, so need no special treatment.
>>     
>
> Exactly!
>
>   
>> U+3006 (IDEOGRAPHIC CLOSING MARK) is listed as in "Common"
>> script in that table.   Without understanding the use of this
>> character, it is plausible that it would occur in a label that
>> consisted only of it, the middle dot, and, e.g., Romanji? If it
>> is not going to be used except when other ideographic characters
>> are present, there is no need to make an exception, although a
>> comment might be in order.  Remember that, as you suggested, the
>> test now requires only a single character that is unambiguously
>> Hiragana, Katakana, or Han.
>>     
>
> U+3006 (IDEOGRAPHIC CLOSING MARK) is some kind of simplified form 
> of U+7DE0.  U+7DE0 is sometimes substituted by U+3006 when it is 
> used for meaning closing, therefore treatment of U+3006 is the same 
> with Han.
I reiterate the question:

Is it reasonable to assume that there exists the reasonable desire to 
register labels that contain IDEOGRAPHIC CLOSING MARK and KATAKANA 
MIDDLE DOT, but no other Han, Katakana or Kana character?

Again, we are seeking a justification for overriding an Unicode 
determination - I don't understand the reason for the determination that 
placed U+7DE0 in script "Han" but U+3006 in script "Common", but 
generally, we have tried to reduce the number of special exceptions to 
the rules determined by looking at Unicode properties as much as possible.

       Harald



More information about the Idna-update mailing list