PR-96, ZWNJ and Arabic, again....
    Harald Alvestrand 
    hta at google.com
       
    Fri Dec 22 09:33:57 CET 2006
    
    
  
Sigh. The more times I go over this, the less I understand.
The PR-96 text says:
   1. *Breaking a cursive connection. *That is, in the context based on
      the Arabic Shaping property, consisting of:
          * A Right-Joining character, followed by zero or more
            Transparent characters, followed by a ZWNJ, followed by zero
            or more Transparent characters, followed by a Left-Joining
            character
          * As a regular expression:
            /$R $T* ZWNJ $T* $L/
            where:
             
                o $T = [:Joining_Type=Transparent:]
                o $R = [[:Joining_Type=Dual_Joining:][:
                  Joining_Type=Right_Joining:]]
                o $L =
                  [[:Joining_Type=Dual_Joining:][:Joining_Type=Left_Joining:]]
                   
          * Example: Farsi <Noon, Alef, Meem, Heh, Alef, Farsi Yeh>.
            Without a ZWNJ, it translates to "names"; with a ZWNJ
            between Heh and Alef, it means "a letter".
             
Straightforward?
Not quite. Of those characters, Alef is RIGHT-joining; all the others 
are dual-joining.
So the pattern $R $T* ZWNJ $T* $L will NOT match the sequence "Heh ZWNJ 
Alef".
Either the example string is given in visual order, the regexp is 
intended to be read as visual order,  something is wrong, or I'm 
horribly confused.
Help?
    
    
More information about the Idna-update
mailing list