I agree.<br><br clear="all">Mark<br>
<br><br><div class="gmail_quote">On Fri, Apr 10, 2009 at 08:24, John C Klensin <span dir="ltr"><<a href="mailto:klensin@jck.com">klensin@jck.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Martin,<br>
<br>
This all makes sense. The information that 人人、would be<br>
orthographically wrong is one of the important bits I wanted to<br>
confirm. Based on your note and Yoneya-san's, I think we should<br>
get the iteration marks out of the CONTEXT category entirely,<br>
making the vertical ones DISALLOWED and the others that are in<br>
Lm PVALID.<br>
<br>
thanks to both you, Yoneya-san, and the others who have<br>
commented for your patience.<br>
<br>
john<br>
<br>
<br>
--On Friday, April 10, 2009 11:24 +0900 "\"Martin J. Dürst\""<br>
<div><div></div><div class="h5"><<a href="mailto:duerst@it.aoyama.ac.jp">duerst@it.aoyama.ac.jp</a>> wrote:<br>
<br>
> Hello John,<br>
><br>
> On 2009/04/09 19:10, John C Klensin wrote:<br>
>><br>
>> --On Thursday, April 09, 2009 16:59 +0900 "\"Martin J.<br>
>> Dürst\"" <<a href="mailto:duerst@it.aoyama.ac.jp">duerst@it.aoyama.ac.jp</a>> wrote:<br>
>><br>
>>> I understand that there is a desire to add some context<br>
>>> constraints for middle dot, but I don't understand why we<br>
>>> need constraints for Ideographic Iteration Mark. In my<br>
>>> opition, the context given by Yoshiro is correct, but the<br>
>>> chance that this character gets confused with something else<br>
>>> is as big or as little as any other randomly picked<br>
>>> character, so I don't see why we would need context. Is it<br>
>>> that this is a punctuation character, that we can only<br>
>>> exceptionally include punctuation characters, and only if<br>
>>> they have context?<br>
>><br>
>> Middle dot (U+30FB) is a punctuation character (Po), so it is<br>
>> allowed only by exception and, for the reasons mentioned<br>
>> earlier, it makes sense to make the exception as narrow as<br>
>> possible.<br>
><br>
> Agreed.<br>
><br>
>> I no longer remember why we treated U+3005 as requiring<br>
>> context. It is Lm in the tables, which brings it under<br>
>> Category A (Section 2.1) in Tables, so, absent other<br>
>> considerations, it ought to default to PVALID. I note that<br>
>> there are several other iteration marks that are just PVALID.<br>
>> I image that U+3005 was called out for special treatment<br>
>> because the Unicode Standard identifies it as part of a "CJK<br>
>> Symbols and Punctuation" block (see page 830 of TUS 5.0). Its<br>
>> presence in the Contextual rule list may consequently be an<br>
>> artifact of the time in which we were still treating the<br>
>> Unicode block structure as significant.<br>
>><br>
>> On a fast scan, there doesn't seem to be anything in<br>
>> Stringprep that calls it out for special treatment. At least<br>
>> at the registry level, none of the iteration marks appear to<br>
>> be Preferred Variants for Chinese (see<br>
>> <a href="http://www.iana.org/domains/idn-tables/tables/cn_zh-cn_4.0.ht" target="_blank">http://www.iana.org/domains/idn-tables/tables/cn_zh-cn_4.0.ht</a><br>
</div></div>>> ml or the identical table for .TW), some, but not all, of them<br>
<div class="im">>> appear in the .JP Preferred Variants list of Japanese (see<br>
>> <a href="http://www.iana.org/domains/idn-tables/tables/jp_ja-jp_1.2.ht" target="_blank">http://www.iana.org/domains/idn-tables/tables/jp_ja-jp_1.2.ht</a><br>
</div>>> ml). .KR has filed only a Hangul table with IANA, so I can<br>
<div><div></div><div class="h5">>> make no inferences there.<br>
>><br>
>> So, if I can ask your indulgence to satisfy my curiosity and<br>
>> slightly reduce my ignorance,<br>
>><br>
>> (i) Are these iteration marks used with Japanese only<br>
>> (out of the CJK script group)?<br>
><br>
> I don't remember to have seen it in Chinese, and I have seen<br>
> explicit character repetition in Chinese, but I rarely look at<br>
> Chinese (and don't read it), so that doesn't mean too much.<br>
> But<br>
> <a href="http://en.wiktionary.org/wiki/Category:Japanese-only_CJKV_Char" target="_blank">http://en.wiktionary.org/wiki/Category:Japanese-only_CJKV_Char</a><br>
> acters<br>
> also lists it as a Japanese-only character.<br>
><br>
>> (ii) How are they used? It may be just an incorrect<br>
>> inference from terminology, but, if I saw something<br>
>> called an "iteration mark", I'd normally expect it to be<br>
>> associated with a numeral that would tell me how many<br>
>> copies of an associated character or string to infer.<br>
><br>
> That's thinking too far. 々 (U+3005) is simply used to repeat<br>
> the previous character. So 人 (hito) means man, person and<br>
> 人々 (hitobito, note the assimilation from h to b) means<br>
> men, people (only used in certain cases, in general, 人 can<br>
> be used for plural, too. 人々 may have originally be written<br>
> 人人、but these days, that would be orthographically wrong.<br>
> There is no device e.g. for a threefold repetition, which is<br>
> not too surprising, because such repetitions don't occur in<br>
> practice. See also <a href="http://en.wiktionary.org/wiki/%E3%80%85" target="_blank">http://en.wiktionary.org/wiki/々</a>.<br>
><br>
>> (iii) Is there any possible reason why some of the<br>
>> iteration marks should be treated as PVALID and others<br>
>> should be CONTEXTO?<br>
><br>
> Not as far as I can immagine. There are good reasons for<br>
> having some PVALID, and there are good reasons for having<br>
> others disallowed, but not CONTEXTO.<br>
><br>
>> (iv) If "vertical" really means that, is U+303B needed<br>
>> in domain names at all? Are they ever, in practice,<br>
>> written vertically? I note that the .JP table<br>
>> (reference above) does not permit that character at all.<br>
>> If it is not used, not useful, and could cause<br>
>> conceptual confusion (can it?), then should it be<br>
>> DISALLOWED rather than PVALID or CONTEXTO?<br>
><br>
> I think Yoshiro already said that the vertical ones are not<br>
> needed and should be disallowed. That applies to all of<br>
> U+3031-3035. They are needed only used in vertical text, and<br>
> therefore don't work for domain names (which are usually<br>
> horizontal).<br>
><br>
><br>
>> I think that this takes us in the direction of removing U+3005<br>
>> and U+303B from the exception list, letting them fall into<br>
>> PVALID because of their Lm classification (unless U+303B<br>
>> should be DISALLOWED as discussed above). But, to the extent<br>
>> possible, it would be good to understand a bit more about the<br>
>> situation first, even though this takes us rather far into the<br>
>> character-by-character analysis that we try to avoid.<br>
><br>
> If we don't want to go too far with character-by-character<br>
> analysis, we can leave the business of excluding U+3031-3035<br>
> to registries.<br>
><br>
> Regards, Martin.<br>
<br>
<br>
<br>
<br>
</div></div><div><div></div><div class="h5">_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</div></div></blockquote></div><br>