ZWNJ and CONTEXT
kenw at sybase.com
Tue Apr 8 00:59:17 CEST 2008
Alireza Saleh wrote:
> Thanks. I have checked the link again. It almost cover most issues
> related to ZWNJ usage . I have found 2 characters which is currently
> categorized as DUAL_JOIN in unicode so ZWNJ will be allowed according to
> the regexp, but having ZWNJ at the left of these characters is
> completely invisible in most cases.
> The characters are
> 1) U+0637
> 2) U+0638
> Ø·âØ³( U637 + ZWNJ + U633 )
> Ø·âØ³( U637 + U633 )
Well, most cases, perhaps. But TAH can form ligatures with
a following MEEM, HAH, YEH, or ALEF MAKSURA, at least, and a
ZWNJ would break those ligatures.
But I don't think the main reason to include ZWNJ as CONTEXTJ
for IDNA is because of their general use to produce special
noncursive display effects for isolated Arabic characters, but rather
to account for the general use to make certain required
distinctions in Persian, in particular.
I would expect, when it comes down to actual registry policies,
that it would make sense for Persian to only allow registrations
where the ZWNJ actually is used in Persian words. And for
the Arabic language, registrations with ZWNJ should just be
rejected, because they aren't needed for Arabic.
> Would it be possible to edit the currect regexp '/$L $T* ZWNJ $T* $R/'
> the way that supports individual characters or it just only works with
> joining definitation of the character in Unicode table.
If you think of it like above, then there really is no reason for
the protocol context rule to try to get more explicit with particular
character combinations. It would just make it that much more
complicated, for no real net gain. This rule is really just a
general filter that rules out ZWNJ in meaningless contexts
in IDNs, but ultimately once the protocol converts what passes
that rule into punycode, you are going to rely on the lookup
in the registry for an actual match.
> ÙThere is also no specification for ZWJ for Arabic Script.Is there any
> other resources which have some information about ZWJ usage for Arabic
I don't think there is any required context in the Arabic script
where use of a ZWJ would make a distinction that would be needed
More information about the Idna-update