You made good points, I agree.<br><br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Wed, Jul 15, 2009 at 16:20, Kenneth Whistler <span dir="ltr"><<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Mark suggested:<br>
<div class="im"><br>
> If so, we could do this by changing Tables 2.9 to be:<br>
><br>
> 2.9. Other Exclusions by Property (I)<br>
> I: Hangul_Syllable_Type(cp) is in {L, V, T} or<br>
> (General_Category(cp) is Lm and Block(cp) = CJK_Symbols_And_Punctuation)<br>
><br>
> This category consists of all conjoining Hangul Jamo (Leading Jamo,<br>
> Vowel Jamo, and Trailing Jamo), plus exclusion of Letter Modifiers in the<br>
> CJK_Symbols_And_Punctuation block<br>
><br>
> Elimination of conjoining Hangul Jamos from the set of PVALID<br>
> characters results in restricting the set of Korean PVALID characters<br>
> just to preformed, modern Hangul syllable characters. Old Hangul<br>
> syllables, which must be spelled with sequences of conjoining Hangul<br>
> Jamos, are not PVALID for IDNs.<br>
><br>
> These particular letter modifiers are not required in normal presentation.<br>
<br>
</div>I oppose that suggestion.<br>
<br>
1. It dilutes the intent of 2.9, which is currently just focussed<br>
on removing Hangul jamo, and turns it into another grab-bag<br>
exception category. That is what 2.6 Exceptions (F) is for.<br>
<br>
2. By seeking to provide a property derivation that just happens<br>
to fit the list of exceptions in question, it essentially hides<br>
the fact that this is none other than an exception list<br>
masquerading as a principled filtering by properties.<br>
You could do the same thing for everything else in the<br>
2.6 Exceptions (F) list.<br>
<br>
The Arabic-Indic digits (both sets):<br>
<br>
(General_Category(cp) = Nd and Block(cp) = Arabic)<br>
<br>
The geresh and gershayim:<br>
<br>
(General_Category(cp) = Po and Block(cp) = Hebrew and Word_Break(cp) = ALetter)<br>
<br>
U+00B7 MIDDLE DOT:<br>
<br>
(General_Category(cp) = Po and Block(cp) = Latin_1 and Word_Break(cp) = MidLetter)<br>
<br>
And so on.<br>
<br>
3. Building such derivations into the rules list in idnabis-tables.txt might<br>
seem to be an elegant way to avoid listing exceptions and to gain<br>
extensibility at the same time. However, in this case, it does<br>
neither.<br>
<br>
a. First of all, the block in question is filled already. No other<br>
characters can ever be added to it. So you are gaining no generality<br>
whatsoever by writing a "rule" that is restricted to an already<br>
closed set.<br>
<br>
b. As opposed to a fixed exception list, you actually *open* the document<br>
to a problem should the UTC ever decide that the General_Category of<br>
any *other* character in that block should be changed to gc=Lm.<br>
Suddenly, by a side effect that nobody will remember at the time,<br>
and which will only be reported much later after the fact, that<br>
decision will result potentially in tipping a PVALID character<br>
into the DISALLOWED category, by virtue of a rule too clever by half.<br>
<br>
So just fix the exception list to take care of U+303B.<br>
<br>
Then you're done with the topic and can move on.<br>
<br>
--Ken<br>
<br>
</blockquote></div><br>