[Idna-arabicscript] Re: ZWNJ and CONTEXT

Alireza Saleh saleh at nic.ir
Tue Apr 8 10:19:35 CEST 2008


Dear Roozbeh,

Thanks for your reply. I'm totally agree with you that it is font 
related, but in common usage in Persian language, there is no 
significant difference just only very small curve between left-joined 
and isolated form of those characters i have mentioned but I just want 
to check the possibility if it could be character based regexp in the 
protocol ? I'm also fully support the idea that the font problems don't 
have any relation with IDN but I think at least it is good to check all 
the possibilities to protect users not only at the registry level but at 
the protocol.

Alireza


Roozbeh Pournader wrote:
> On Mon, Apr 7, 2008 at 3:52 PM, Alireza Saleh <saleh at nic.ir> wrote:
> I have found 2 characters which is currently categorized as
>> DUAL_JOIN in unicode so ZWNJ will be allowed according to the regexp, 
>> but
>> having ZWNJ at the left of these characters is completely invisible 
>> in most
>> cases.
>
> It's not only those two characters, but it's only some fonts. Some
> fonts were not designed for languages that use the ZWNJ, they were
> designed for the Arabic language only. So they share glyphs among
> different forms, where they shouldn't. A good Naskhi Persian font will
> distinguish visually between those cases.
>
>> The characters are
>> 1) U+0637
>> 2) U+0638
>> ط‌س( U637 + ZWNJ + U633 )
>> ط‌س( U637 + U633 )
>>
>>
>> Would it be possible to edit the currect regexp '/$L $T* ZWNJ $T* 
>> $R/' the
>> way that supports individual characters or it just only works with 
>> joining
>> definitation of the character in Unicode table.
>
> Again, I believe this is very font-specific. Both the letter you use
> (Tah and Seen) may have very similar glyphs for the different forms in
> your font(s). Here, they have indistinguishable glyphs for initial and
> isolated forms of Tah and indistinguishable glyphs for final and
> isolated forms of Seen. Even in your font, this would not have
> happened if you used Ain or Yeh for example, instead of Seen.
>
> So this is not about certain letters, but certain combinations of
> letter in certain fonts. If you want to be extra secure, you need to
> generate a table of all two-character combinations and check them in
> all the fonts that you think your users may be using, so you can do
> something differently about them.
>
>> There is also no specification for ZWJ for Arabic Script.Is there any
>> other resources which have some information about ZWJ usage for Arabic
>> Script.
>
> It's in the Unicode 5.0 book, there are parts about ZWJ. See for
> example pages 270-271, 275-281, and 537-540. PDF version of chapters
> is available on the left sidebar here:
> http://www.unicode.org/versions/Unicode5.1.0/
>
> Roozbeh
> ------------------------------------------------------------------------
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update




More information about the Idna-update mailing list