U+303B VERTICAL IDEOGRAPHIC ITERATION MARK

Kenneth Whistler kenw at sybase.com
Thu Jul 16 01:20:33 CEST 2009


Mark suggested:

> If so, we could do this by changing Tables 2.9 to be:
> 
> 2.9.  Other Exclusions by Property (I)
>    I: Hangul_Syllable_Type(cp) is in {L, V, T} or
>       (General_Category(cp) is Lm and Block(cp) = CJK_Symbols_And_Punctuation)
> 
>    This category consists of all conjoining Hangul Jamo (Leading Jamo,
>    Vowel Jamo, and Trailing Jamo), plus exclusion of Letter Modifiers in the
>    CJK_Symbols_And_Punctuation block
> 
>    Elimination of conjoining Hangul Jamos from the set of PVALID
>    characters results in restricting the set of Korean PVALID characters
>    just to preformed, modern Hangul syllable characters.  Old Hangul
>    syllables, which must be spelled with sequences of conjoining Hangul
>    Jamos, are not PVALID for IDNs.
> 
>    These particular letter modifiers are not required in normal presentation.

I oppose that suggestion.

1. It dilutes the intent of 2.9, which is currently just focussed
   on removing Hangul jamo, and turns it into another grab-bag
   exception category. That is what 2.6 Exceptions (F) is for.
   
2. By seeking to provide a property derivation that just happens
   to fit the list of exceptions in question, it essentially hides
   the fact that this is none other than an exception list
   masquerading as a principled filtering by properties.
   You could do the same thing for everything else in the
   2.6 Exceptions (F) list.
   
   The Arabic-Indic digits (both sets):
   
   (General_Category(cp) = Nd and Block(cp) = Arabic)
   
   The geresh and gershayim:
   
   (General_Category(cp) = Po and Block(cp) = Hebrew and Word_Break(cp) = ALetter)
   
   U+00B7 MIDDLE DOT:
   
   (General_Category(cp) = Po and Block(cp) = Latin_1 and Word_Break(cp) = MidLetter)
   
   And so on.
   
3. Building such derivations into the rules list in idnabis-tables.txt might
   seem to be an elegant way to avoid listing exceptions and to gain
   extensibility at the same time. However, in this case, it does
   neither.
   
   a. First of all, the block in question is filled already. No other
      characters can ever be added to it. So you are gaining no generality
      whatsoever by writing a "rule" that is restricted to an already
      closed set.
      
   b. As opposed to a fixed exception list, you actually *open* the document
      to a problem should the UTC ever decide that the General_Category of
      any *other* character in that block should be changed to gc=Lm.
      Suddenly, by a side effect that nobody will remember at the time,
      and which will only be reported much later after the fact, that
      decision will result potentially in tipping a PVALID character
      into the DISALLOWED category, by virtue of a rule too clever by half.
      
So just fix the exception list to take care of U+303B.

Then you're done with the topic and can move on.

--Ken



More information about the Idna-update mailing list