PR-96, ZWNJ and Arabic, again....
Harald Alvestrand
hta at google.com
Fri Dec 22 09:33:57 CET 2006
Sigh. The more times I go over this, the less I understand.
The PR-96 text says:
1. *Breaking a cursive connection. *That is, in the context based on
the Arabic Shaping property, consisting of:
* A Right-Joining character, followed by zero or more
Transparent characters, followed by a ZWNJ, followed by zero
or more Transparent characters, followed by a Left-Joining
character
* As a regular expression:
/$R $T* ZWNJ $T* $L/
where:
o $T = [:Joining_Type=Transparent:]
o $R = [[:Joining_Type=Dual_Joining:][:
Joining_Type=Right_Joining:]]
o $L =
[[:Joining_Type=Dual_Joining:][:Joining_Type=Left_Joining:]]
* Example: Farsi <Noon, Alef, Meem, Heh, Alef, Farsi Yeh>.
Without a ZWNJ, it translates to "names"; with a ZWNJ
between Heh and Alef, it means "a letter".
Straightforward?
Not quite. Of those characters, Alef is RIGHT-joining; all the others
are dual-joining.
So the pattern $R $T* ZWNJ $T* $L will NOT match the sequence "Heh ZWNJ
Alef".
Either the example string is given in visual order, the regexp is
intended to be read as visual order, something is wrong, or I'm
horribly confused.
Help?
More information about the Idna-update
mailing list