[0-9\-] in Single-script Contextual Rules

Mark Davis ⌛ mark at macchiato.com
Fri Jul 24 19:45:04 CEST 2009


What may make sense, for this and the Katakana middle dot, is to just have
rules like the following. That is, not test for adjacency or all characters,
but require the label to have *at least* one character of the appropriate
script. That gives the following

GREEK LOWER NUMERAL SIGN:

Rule Set:
     False;
     For All Characters:
        If Script(cp) .eq.  Greek Then True;
     End For;

KATAKANA MIDDLE DOT:

Rule Set:
     False;
     For All Characters:
        If Script(cp) .in. {Hiragana, Katakana, Han} Then True;
     End For;

We should also apply this to the Hebrew; the problem with looking at the
character before is that if someone uses combining marks on the preceding
letter (not common for Hebrew itself, but does occur in other
orthographies), then it would be improperly disallowed. Yet it is not worth
having a convoluted test for that. So we could apply the same mechanism to
those two cases as well:

HEBREW PUNCTUATION GERESH:
HEBREW PUNCTUATION GERSHAYIM:

Rule Set:
     False;
     For All Characters:
        If Script(cp) .eq. Hebrew Then True;
     End For;

Mark


On Fri, Jul 24, 2009 at 07:35, Wil Tan <wil at cloudregistry.net> wrote:

> Hi all,
>
> Several of the contextual rules specify that the label must only
> contain a certain script (e.g. Greek, Cyrillic). However, I believe
> that in some cases, the use of [0-9] and Hyphen-minus, all of which
> are in the "Zyyy" script, is often permitted and makes sense. For
> example,
>
> Appendix A.5. GREEK LOWER NUMERAL SIGN (KERAIA)
>   Code point:
>      U+0375
>   Overview:
>      Greek script only.
>   Lookup:
>      False
>   Rule Set:
>      True;
>      For All Characters:
>         If Script(cp) .ne.  Greek Then False;
>      End For;
>
> I wonder if we are being too restrictive here. I note that the .gr
> registry allows 0-9 in their IDN policies.
>
> Perhaps we should change the rule to the following?
>
>  True;
>  For All Characters:
>    If Script(cp) .ne. Greek And cp Not .in. 002D,0030..0039 Then False;
>  End For;
>
>
> Similar treatment may also be warranted in A.6 Combining Cyrillic Titlo?
>
> =wil
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090724/c165afac/attachment.htm 


More information about the Idna-update mailing list