[Idna-arabicscript] Re: ZWNJ and CONTEXT

Roozbeh Pournader roozbeh at gmail.com
Tue Apr 8 00:31:21 CEST 2008


On Mon, Apr 7, 2008 at 3:52 PM, Alireza Saleh <saleh at nic.ir> wrote:
I have found 2 characters which is currently categorized as
> DUAL_JOIN in unicode so ZWNJ will be allowed according to the regexp, but
> having ZWNJ at the left of these characters is completely invisible in most
> cases.

It's not only those two characters, but it's only some fonts. Some
fonts were not designed for languages that use the ZWNJ, they were
designed for the Arabic language only. So they share glyphs among
different forms, where they shouldn't. A good Naskhi Persian font will
distinguish visually between those cases.

>  The characters are
>  1) U+0637
>  2) U+0638
>  ط‌س( U637 + ZWNJ + U633 )
>  ط‌س( U637 + U633 )
>
>
>  Would it be possible to edit the currect regexp '/$L $T* ZWNJ $T* $R/' the
> way that supports individual characters or it just only works with joining
> definitation of the character in Unicode table.

Again, I believe this is very font-specific. Both the letter you use
(Tah and Seen) may have very similar glyphs for the different forms in
your font(s). Here, they have indistinguishable glyphs for initial and
isolated forms of Tah and indistinguishable glyphs for final and
isolated forms of Seen. Even in your font, this would not have
happened if you used Ain or Yeh for example, instead of Seen.

So this is not about certain letters, but certain combinations of
letter in certain fonts. If you want to be extra secure, you need to
generate a table of all two-character combinations and check them in
all the fonts that you think your users may be using, so you can do
something differently about them.

> There is also no specification for ZWJ for Arabic Script.Is there any
> other resources which have some information about ZWJ usage for Arabic
> Script.

It's in the Unicode 5.0 book, there are parts about ZWJ. See for
example pages 270-271, 275-281, and 537-540. PDF version of chapters
is available on the left sidebar here:
http://www.unicode.org/versions/Unicode5.1.0/

Roozbeh


More information about the Idna-update mailing list