U+200C rule
Patrik Fältström
patrik at frobbit.se
Sun Mar 20 14:20:12 CET 2011
On 20 mar 2011, at 11.53, Simon Josefsson wrote:
> Hi. The rule for U+200C is:
>
> False;
>
> If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
>
> If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
>
> (Joining_Type:T)*(Joining_Type:{R,D})) Then True;
>
> I could not find any precise definition of how to implement RegExpMatch.
>
> For example, consider a label that contains two U+200C, where one of the
> U+200C is used in the permitted way, and the other is not.
>
> A regexp match on that string -- at least with regular expressions as
> defined by POSIX, Emacs, Perl, etc, which are all slightly different --
> would find the positive usage and permit the label.
>
> Is this the intention?
No
> If not, what is the intended way to implemented RegExpMatch?
The expression try to say that you need around _each_ \u200C the following:
> One codepoint with either Joining_Type L or D
>
> Zero or more codepoints with Joining_Type T
>
> The \u200C
>
> Zero or more codepoints with Joining_Type T
>
> One codepoint with either Joining_Type R or D
The regexp does not take into account more than one \u200c in each string.
Patrik
More information about the Idna-update
mailing list