confusing notation in the ZERO WIDTH NON-JOINER contextual rule

Patrik Fältström paf at frobbit.se
Thu Aug 9 11:06:12 CEST 2012


6 aug 2012 kl. 04:31 skrev debug at test1.org:

> RFC5892 contains the following rule about the contextual validity of U+200C:
> 
>> If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
>>        (Joining_Type:T)*(Joining_Type:{R,D})) Then True;
> 
> By intuition, I understand that "\u200C" within the regex means the code
> point in question. So, a feasible interpretation would be:
> 
> (*) The code point MUST occur between Joining_Type:{L,D} and
> Joining_Type:{R,D}, where arbitrary occurences of Joining_Type:T MAY be
> in between.

Correct.

> On the other hand, the statement literally defines just a regex that
> should match the string somewhere (with no reference to "cp" as in other
> rules), such that the rule would be satisfied already if any U+200C
> fulfill the requirement.

That is also correct, but the wrong interpretation.

I have submitted an errata:

OLD:

In A:

Code point:

The code point, or code points, to which this rule is to be
applied.  Normally, this implies that if any of the code points in
a label is as defined, then the rules should be applied.  If
evaluated to True, the code point is OK as used; if evaluated to
False, it is not OK.

In A.1:

Rule Set:
  False;
  If Canonical_Combining_Class(Before(cp)) .eq.  Virama Then True;
  If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
    (Joining_Type:T)*(Joining_Type:{R,D})) Then True;

NEW:

In A:

Code point:

The code point, or code points, to which this rule is to be
applied.  Normally, this implies that if any of the code points in
a label is as defined, then the rules should be applied.  If
evaluated to True, the code point is OK as used; if evaluated to
False, it is not OK.

For the rule to be evaluated to True for the label, it MUST be
evaluated to True for every occurrence of Code point in the
label.

In A.1:

Rule Set:
  False;
  If Canonical_Combining_Class(Before(cp)) .eq.  Virama Then True;
  If cp .eq. \u200C .and. RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*cp
    (Joining_Type:T)*(Joining_Type:{R,D})) Then True;

  Patrik



More information about the Idna-update mailing list