bidi spec

Harald Alvestrand harald at alvestrand.no
Thu Feb 7 05:34:10 CET 2008


Mark Davis skrev:
>
>     However, I found that I have to add the following rule in order to
>     satisfy the "no two labels display the same" property:
>
>     If there is an EN or AN present, there may not be an NSM
>
>     Without this rule, we get the following behavior under ltr:
>
>     R NSM EN -> EN NSM R
>     R EN NSM -> EN NSM R
>
>     Does this make sense?
>
>
> Here is what is happening with that. The BIDI algorithm is designed
> for display, and in display, the NSMs are designed to follow their
> base -- in display order. That means an NSM following an R character
> will come after it within its level (odd). So that is covered by the
> following rule:
>
> //
>
> /L3. Combining marks applied to a right-to-left base character will at
> this point precede their base character. If the rendering engine
> expects them to follow the base characters in the final display
> process, then the ordering of the marks and the base character must be
> reversed./
>
> ////
>
> What you are not seeing when you just look at the text is that what
> the bidi algorithm actually produces is a series of levels associated
> with the text. That level information is available at the time of L3
> for the display engine to use.
>
> So what you need to do is make your test do L3 (not done by ICU, since
> it is targeted at display layout). So you need to make one more pass
> through each segment of text that is at an odd level, and reverse any
> sequence matching the following regex:
>
> /[:bc=NSM:]+ [:^bc=NSM:]/
Thanks Mark!

I have a bit of difficulty parsing the text, since Erik's last example
is R EN NSM (the NSM being applied to the EN, which is not a
right-to-left base character - the text was probably intended to say
"applied to a base character that currently has a right-to-left
direction), and a bit of difficulty implementing it, since the NSM at
this stage in the process has been changed to either EN or R, so I'll
have to look at its "original" type rather than its "current" type - but
that's implementable.

(if suggesting an amendment, I'd make it talk about NSM, not combining
marks - there are the two Mn that are not NSM....)

                    Harald



More information about the Idna-update mailing list