Rules for ZWJ and ZWNJ (Re: Moving Right Along on the Inclusions Table...)

Mark Davis mark.davis at icu-project.org
Thu Dec 21 15:57:04 CET 2006


Very briefly, since I have a call coming up.

Yes, the "produce a visual distinction" issue is one what we want to narrow.

Characters could be added to the shaping property, but what that would mean
is that additional strings would become available in future versions of
Unicode, which would already be happening as new characters are added. (will
finish this later).

Mark

On 12/21/06, Harald Alvestrand <harald at alvestrand.no> wrote:
>
> Mark Davis wrote:
> > I've linked to it on several occasions:
> > http://www.unicode.org/review/pr-96.html
> >
> > While it is not completely settled -- it is out for review now and you
> > can see the questions we are asking -- I don't see a problem with it
> > progressing to the point where we can use it by the time the other
> > work we are doing is ready.
> Well, one problem with it is that it requires a certain amount of
> chasing down references that are obscure to the casual reader.... I'm
> paraphrasing the rule below, to see if I understand it:
>
> I read it as saying that ZWJ can occur only after a virama (with a few
> more conditions), which is the modifier letters with combining class
> (ccc) 9 in the Unicode property tables:
>
> 094D;DEVANAGARI SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 09CD;BENGALI SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0A4D;GURMUKHI SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0ACD;GUJARATI SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0B4D;ORIYA SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0BCD;TAMIL SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0C4D;TELUGU SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0CCD;KANNADA SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0D4D;MALAYALAM SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 0DCA;SINHALA SIGN AL-LAKUNA;Mn;9;NSM;;;;;N;;;;;
> 0E3A;THAI CHARACTER PHINTHU;Mn;9;NSM;;;;;N;THAI VOWEL SIGN PHINTHU;;;;
> 0F84;TIBETAN MARK HALANTA;Mn;9;NSM;;;;;N;TIBETAN VIRAMA;;;;
> 1039;MYANMAR SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 1714;TAGALOG SIGN VIRAMA;Mn;9;NSM;;;;;N;;;;;
> 1734;HANUNOO SIGN PAMUDPOD;Mn;9;NSM;;;;;N;;;;;
> 17D2;KHMER SIGN COENG;Mn;9;NSM;;;;;N;;;;;
> A806;SYLOTI NAGRI SIGN HASANTA;Mn;9;NSM;;;;;N;;;;;
> 10A3F;KHAROSHTHI VIRAMA;Mn;9;NSM;;;;;N;;;;;
>
> (I may have missed some, since I found these by "grep". Are there virama
> that don't fall into class Mn?)
> This is a bit more than Devangari, but they may all be scripts where
> this is guaranteed to cause no harm (as seen from an IDNA viewpoint).
> Can anyone verify?
>
> A ZWNJ can occur in the same kind of position too (same regexp).
>
> ("harm from an IDNA viewpoint" is probably confusability... the question
> asked in the -96 file is:
>
> In particular, in which scripts of South East Asia are ZWJ and ZWNJ not
> necessary for visual distinctions?
>
> while the classical IDNA question would be:
>
> In partiuclar, are there scripts of South East Asia where ZWJ and ZWNJ
> can occur after a virama without causing a visual distinction?)
>
>
> A ZWNJ may also occur between a Right-joining and a Left-joining
> character (either of those may be Dual-joining, too), with possible
> embedded Transparent characters.
>
> This property is from ArabicShaping.txt, which says:
> # - Those that not explicitly listed that are of General Category Mn,
> Me, or Cf
> #   have joining type T.
> None are explicitly listed, so the general categories have to be used
> for finding transparent characters. However, all the possible occurences
> of Right-joining and Left-joining characters are in ArabicShaping.txt,
> so this rule is then limited to the Arabic script. (right?)
>
> So we have 69 right-joining and 170 dual-joining characters in
> ArabicShaping.txt - I'm assuming a stability guarantee that no
> characters outside of Arabic will be added to this file in the future.
>
> >
> > Mark
> >
> > On 12/20/06, *Harald Alvestrand * <harald at alvestrand.no
> > <mailto:harald at alvestrand.no>> wrote:
> >
> >     Mark Davis wrote:
> >     > Those are all reasonable changes.
> >     >
> >     >     * We should also add the Joiner/NonJoiner. They would
> >     however, as
> >     >       discussed, be restricted to very specific contexts by
> >     additional
> >     >       clauses (like the current bidi restrictions).
> >     >
> >     Mark, can you take a stab at writing down those rules?
> >     I have seen you referred to this as a "solved problem" a couple of
> >     times, but I haven't seen a specific algorithm proposed yet.
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061221/31e29d3a/attachment-0001.html


More information about the Idna-update mailing list