Tables and contextual rule for IDEOGRAPHIC ITERATION MARKs

Mark Davis mark at macchiato.com
Fri Apr 10 20:41:06 CEST 2009


I agree.

Mark


On Fri, Apr 10, 2009 at 08:24, John C Klensin <klensin at jck.com> wrote:

> Martin,
>
> This all makes sense.  The information that 人人、would be
> orthographically wrong is one of the important bits I wanted to
> confirm.  Based on your note and Yoneya-san's, I think we should
> get the iteration marks out of the CONTEXT category entirely,
> making the vertical ones DISALLOWED and the others that are in
> Lm PVALID.
>
> thanks to both you, Yoneya-san, and the others who have
> commented for your patience.
>
>    john
>
>
> --On Friday, April 10, 2009 11:24 +0900 "\"Martin J. Dürst\""
> <duerst at it.aoyama.ac.jp> wrote:
>
> > Hello John,
> >
> > On 2009/04/09 19:10, John C Klensin wrote:
> >>
> >> --On Thursday, April 09, 2009 16:59 +0900 "\"Martin J.
> >> Dürst\"" <duerst at it.aoyama.ac.jp>  wrote:
> >>
> >>> I understand that there is a desire to add some context
> >>> constraints for  middle dot, but I don't understand why we
> >>> need constraints for Ideographic Iteration Mark. In my
> >>> opition, the context given by Yoshiro  is correct, but the
> >>> chance that this character gets confused with  something else
> >>> is as big or as little as any other randomly picked
> >>> character, so I don't see why we would need context. Is it
> >>> that this is  a punctuation character, that we can only
> >>> exceptionally include  punctuation characters, and only if
> >>> they have context?
> >>
> >> Middle dot (U+30FB) is a punctuation character (Po), so it is
> >> allowed only by exception and, for the reasons mentioned
> >> earlier, it makes sense to make the exception as narrow as
> >> possible.
> >
> > Agreed.
> >
> >> I no longer remember why we treated U+3005 as requiring
> >> context. It is Lm in the tables, which brings it under
> >> Category A (Section 2.1) in Tables, so, absent other
> >> considerations, it ought to default to PVALID.  I note that
> >> there are several other iteration marks that are just PVALID.
> >> I image that U+3005 was called out for special treatment
> >> because the Unicode Standard identifies it as part of a "CJK
> >> Symbols and Punctuation" block (see page 830 of TUS 5.0). Its
> >> presence in the Contextual rule list may consequently be an
> >> artifact of the time in which we were still treating the
> >> Unicode block structure as significant.
> >>
> >> On a fast scan, there doesn't seem to be anything in
> >> Stringprep that calls it out for special treatment.  At least
> >> at the registry level, none of the iteration marks appear to
> >> be Preferred Variants for Chinese (see
> >> http://www.iana.org/domains/idn-tables/tables/cn_zh-cn_4.0.ht
> >> ml or the identical table for .TW), some, but not all, of them
> >> appear in the .JP Preferred Variants list of Japanese (see
> >> http://www.iana.org/domains/idn-tables/tables/jp_ja-jp_1.2.ht
> >> ml). .KR has filed only a Hangul table with IANA, so I can
> >> make no inferences there.
> >>
> >> So, if I can ask your indulgence to satisfy my curiosity and
> >> slightly reduce my ignorance,
> >>
> >>      (i) Are these iteration marks used with Japanese only
> >>      (out of the CJK script group)?
> >
> > I don't remember to have seen it in Chinese, and I have seen
> > explicit character repetition in Chinese, but I rarely look at
> > Chinese (and don't read it), so that doesn't mean too much.
> > But
> > http://en.wiktionary.org/wiki/Category:Japanese-only_CJKV_Char
> > acters
> > also lists it as a Japanese-only character.
> >
> >>      (ii) How are they used?   It may be just an incorrect
> >>      inference from terminology, but, if I saw something
> >>      called an "iteration mark", I'd normally expect it to be
> >>      associated with a numeral that would tell me how many
> >>      copies of an associated character or string to infer.
> >
> > That's thinking too far. 々 (U+3005) is simply used to repeat
> > the previous character. So 人 (hito) means man, person and
> > 人々 (hitobito, note the assimilation from h to b) means
> > men, people (only used in certain cases, in general, 人 can
> > be used for plural, too. 人々 may have originally be written
> > 人人、but these days, that would be orthographically wrong.
> > There is no device e.g. for a threefold repetition, which is
> > not too surprising, because such repetitions don't occur in
> > practice. See also http://en.wiktionary.org/wiki/々<http://en.wiktionary.org/wiki/%E3%80%85>
> .
> >
> >>      (iii) Is there any possible reason why some of the
> >>      iteration marks should be treated as PVALID and others
> >>      should be CONTEXTO?
> >
> > Not as far as I can immagine. There are good reasons for
> > having some PVALID, and there are good reasons for having
> > others disallowed, but not CONTEXTO.
> >
> >>      (iv) If "vertical" really means that, is U+303B needed
> >>      in domain names at all?  Are they ever, in practice,
> >>      written vertically?  I note that the .JP table
> >>      (reference above) does not permit that character at all.
> >>      If it is not used, not useful, and could cause
> >>      conceptual confusion (can it?), then should it be
> >>      DISALLOWED rather than PVALID or CONTEXTO?
> >
> > I think Yoshiro already said that the vertical ones are not
> > needed and should be disallowed. That applies to all of
> > U+3031-3035. They are needed only used in vertical text, and
> > therefore don't work for domain names (which are usually
> > horizontal).
> >
> >
> >> I think that this takes us in the direction of removing U+3005
> >> and U+303B from the exception list, letting them fall into
> >> PVALID because of their Lm classification (unless U+303B
> >> should be DISALLOWED as discussed above).  But, to the extent
> >> possible, it would be good to understand a bit more about the
> >> situation first, even though this takes us rather far into the
> >> character-by-character analysis that we try to avoid.
> >
> > If we don't want to go too far with character-by-character
> > analysis, we can leave the business of excluding U+3031-3035
> > to registries.
> >
> > Regards,    Martin.
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090410/d6ba6c90/attachment-0001.htm 


More information about the Idna-update mailing list