Hmmm. To answer that,<br><ul><li>I'd look first at the characters with "VERTICAL" and "REPEAT" or "ITERATION" in their name:</li><ul><li><a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=</a>\p{name%3D%2FVERTICAL.*%28REPEAT|ITERATION%29%2F}</li>
</ul><li>Picking one, we can see the properties:</li><ul><li><a href="http://unicode.org/cldr/utility/character.jsp?a=3032">http://unicode.org/cldr/utility/character.jsp?a=3032</a></li></ul><li>All of them are in the block [:Block=CJK_Symbols_And_Punctuation:], and in Letter Modifier:</li>
<ul><li><a href="http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[">http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[</a>\p{Block%3DCJK_Symbols_And_Punctuation}%26\p{lm}]</li></ul><li>The additional character in this set is <code><a target="c" href="http://unicode.org/cldr/utility/character.jsp?a=3005">U+3005</a></code> ( 々 ) IDEOGRAPHIC ITERATION MARK. That is called out specially as a context character in <a name="section-2.6">2.6</a>. Exceptions (F), so we don't have to worry about it.<br>
</li><li>So we could use the above set.</li></ul>If so, we could do this by changing Tables 2.9 to be:<br><br><pre class="newpage"><span class="h3"><h3><a name="section-2.9">2.9</a>. Other Exclusions by Property (I)</h3>
</span><br> I: Hangul_Syllable_Type(cp) is in {L, V, T} or<br> (General_Category(cp) is Lm and Block(cp) = CJK_Symbols_And_Punctuation)<br><br> This category consists of all conjoining Hangul Jamo (Leading Jamo,<br>
Vowel Jamo, and Trailing Jamo), plus exclusion of Letter Modifiers in the <br> CJK_Symbols_And_Punctuation block<br><br> Elimination of conjoining Hangul Jamos from the set of PVALID<br> characters results in restricting the set of Korean PVALID characters<br>
just to preformed, modern Hangul syllable characters. Old Hangul<br> syllables, which must be spelled with sequences of conjoining Hangul<br> Jamos, are not PVALID for IDNs.<br><br> These particular letter modifiers are not required in normal presentation.<br>
</pre><br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Wed, Jul 15, 2009 at 14:43, Vint Cerf <span dir="ltr"><<a href="mailto:vint@google.com">vint@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
If we can possibly avoid char by char rules that would be very helpful<br>
in dealing with updates to Unicode.<br>
<br>
I gather these characters don't quite fall into a category that would<br>
permit algorithmic treatment?<br>
<font color="#888888"><br>
vint<br>
</font><div><div></div><div class="h5"><br>
<br>
On Jul 15, 2009, at 5:06 PM, Eric Brunner-Williams wrote:<br>
<br>
> Kenneth Whistler wrote:<br>
>> I agree with Wil Tan about this.<br>
>><br>
>> The Vertical Kana repeat marks (3031..3035) make no sense<br>
>> in IDN's, particularly since they will certainly be forced<br>
>> into horizontal display contexts, where they could accomplish<br>
>> nothing but introduce mischief and confusion.<br>
>><br>
><br>
> This, "... since they will certainly be forced into horizontal display<br>
> contexts ..." is just what I ment when attempting to discuss what I<br>
> called at the time (SF +/- some) the "linearization" of descending<br>
> script, Arabic script in particular. I'm also concerned about<br>
> non-Cyrillic Mongolian, which is vertical, for similar reasons.<br>
><br>
> The point I was attempting to make earlier (SF +/-), circa TATWEEL, is<br>
> that a requirement for single baseline script doesn't arise from a<br>
> registrar requirement. It may arise elsewhere, but if we can't state<br>
> where the requirement comes from, it doesn't exist, and where a<br>
> vertical<br>
> script uses vertical character sequence conventions, such as iteration<br>
> marks, the rational for action can't be "it doesn't work<br>
> horizontally".<br>
><br>
> I'm not disagreeing with Wil, and possibly Ken, only noting concern<br>
> about a preference for display contexts.<br>
><br>
> Eric<br>
>> As for U+303B VERTICAL IDEOGRAPHIC ITERATION MARK, it is<br>
>> also useless in IDN's, and I don't think it is helpful or<br>
>> pertinent to clutter up the CONTEXTO rules in the appendix A<br>
>> listing trying to come up with an appropriate rule for this.<br>
>><br>
>> As for attempting to stand on principle that IDNA should not<br>
>> categorize characters as DISALLOWED unless shown to be<br>
>> harmful, we already crossed that bridge a long time ago<br>
>> by ruling 1000's of symbols as DISALLOWED on general<br>
>> principle, even though they are less problematical than<br>
>> these vertical display characters.<br>
>><br>
>> And finally, there is no good reason whatsoever why U+303B<br>
>> should be CONTEXTO (and have that stand as some kind of<br>
>> precedent that we can't reverse to make it DISALLOWED<br>
>> in the table), when all these other, more problematical<br>
>> vertical form characters are sitting in the table as PVALID<br>
>> and not CONTEXTO. So from the point of view of<br>
>> consistency and minimal confusion for implementers,<br>
>> the best choice is to make the lot DISALLOWED and be done<br>
>> with it -- *particularly* if we agree that:<br>
>><br>
>> "Sane registry policy everywhere will still probably set this to<br>
>> registry-disallowed."<br>
>><br>
>> --Ken<br>
>><br>
>><br>
>>> I think the following should be DISALLOWED:<br>
>>><br>
>>> U+3031: Lm: VERTICAL KANA REPEAT MARK<br>
>>> U+3032: Lm: VERTICAL KANA REPEAT WITH VOICED SOUND MARK<br>
>>> U+3033: Lm: VERTICAL KANA REPEAT MARK UPPER HALF<br>
>>> U+3034: Lm: VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF<br>
>>> U+3035: Lm: VERTICAL KANA REPEAT MARK LOWER HALF<br>
>>> U+303B: Lm: VERTICAL IDEOGRAPHIC ITERATION MARK<br>
>>><br>
>>> Mainly because U+3033 looks like protocol character (forward slash)<br>
>>> and thus harmful IMO. Since this is a group of characters with<br>
>>> related<br>
>>> usage, and that Yoneya-san, Martin Dürst and John suggested that<br>
>>> they<br>
>>> should be disallowed:<br>
>>> <a href="http://www.alvestrand.no/pipermail/idna-update/2009-April/004398.html" target="_blank">http://www.alvestrand.no/pipermail/idna-update/2009-April/004398.html</a><br>
>>><br>
>>> =wil<br>
>>><br>
>><br>
>> _______________________________________________<br>
>> Idna-update mailing list<br>
>> <a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
>> <a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
>><br>
>><br>
>><br>
><br>
><br>
> _______________________________________________<br>
> Idna-update mailing list<br>
> <a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
> <a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
<br>
_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div></div></blockquote></div><br>