U+200C rule

Simon Josefsson simon at josefsson.org
Sun Mar 20 11:53:13 CET 2011

Hi.  The rule for U+200C is:


      If Canonical_Combining_Class(Before(cp)) .eq.  Virama Then True;

      If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C

         (Joining_Type:T)*(Joining_Type:{R,D})) Then True;

I could not find any precise definition of how to implement RegExpMatch.

For example, consider a label that contains two U+200C, where one of the
U+200C is used in the permitted way, and the other is not.

A regexp match on that string -- at least with regular expressions as
defined by POSIX, Emacs, Perl, etc, which are all slightly different --
would find the positive usage and permit the label.

Is this the intention?

If not, what is the intended way to implemented RegExpMatch?


More information about the Idna-update mailing list