ZWNJ and CONTEXT
saleh at nic.ir
Tue Apr 8 10:33:31 CEST 2008
Dear Ken ,
Yes I got your points, I also talked specially with John to include ZWNJ
which is a necessary for Persian,Now it will going to be available, but
at least we need to make it in a safe way which doesn't make confusion
problem or very big backward incompatibility to the current
implementation of IDN for Persian language. Most of the problems can be
resolved at the registry level but some of the are out of scope of the
registry and would be useful if we can find a very good support of them
within the protocol if required. Would it be possible to have any rule
in CONTEXT(J/O) to address combination of individual characters as well
as the current suggestion which works around category of them.
Kenneth Whistler wrote:
> Alireza Saleh wrote:
>> Thanks. I have checked the link again. It almost cover most issues
>> related to ZWNJ usage . I have found 2 characters which is currently
>> categorized as DUAL_JOIN in unicode so ZWNJ will be allowed according to
>> the regexp, but having ZWNJ at the left of these characters is
>> completely invisible in most cases.
>> The characters are
>> 1) U+0637
>> 2) U+0638
>> Ø·â€ŒØ³( U637 + ZWNJ + U633 )
>> Ø·â€ŒØ³( U637 + U633 )
> Well, most cases, perhaps. But TAH can form ligatures with
> a following MEEM, HAH, YEH, or ALEF MAKSURA, at least, and a
> ZWNJ would break those ligatures.
> But I don't think the main reason to include ZWNJ as CONTEXTJ
> for IDNA is because of their general use to produce special
> noncursive display effects for isolated Arabic characters, but rather
> to account for the general use to make certain required
> distinctions in Persian, in particular.
> I would expect, when it comes down to actual registry policies,
> that it would make sense for Persian to only allow registrations
> where the ZWNJ actually is used in Persian words. And for
> the Arabic language, registrations with ZWNJ should just be
> rejected, because they aren't needed for Arabic.
>> Would it be possible to edit the currect regexp '/$L $T* ZWNJ $T* $R/'
>> the way that supports individual characters or it just only works with
>> joining definitation of the character in Unicode table.
> If you think of it like above, then there really is no reason for
> the protocol context rule to try to get more explicit with particular
> character combinations. It would just make it that much more
> complicated, for no real net gain. This rule is really just a
> general filter that rules out ZWNJ in meaningless contexts
> in IDNs, but ultimately once the protocol converts what passes
> that rule into punycode, you are going to rely on the lookup
> in the registry for an actual match.
>> Ù‘There is also no specification for ZWJ for Arabic Script.Is there any
>> other resources which have some information about ZWJ usage for Arabic
> I don't think there is any required context in the Arabic script
> where use of a ZWJ would make a distinction that would be needed
> for IDNs.
More information about the Idna-update