bidi spec

Mark Davis mark.davis at icu-project.org
Thu Feb 7 04:52:29 CET 2008


> However, I found that I have to add the following rule in order to
> satisfy the "no two labels display the same" property:
>
> If there is an EN or AN present, there may not be an NSM
>
> Without this rule, we get the following behavior under ltr:
>
> R NSM EN -> EN NSM R
> R EN NSM -> EN NSM R
>
> Does this make sense?


Here is what is happening with that. The BIDI algorithm is designed for
display, and in display, the NSMs are designed to follow their base -- in
display order. That means an NSM following an R character will come after it
within its level (odd). So that is covered by the following rule:

**

*L3. Combining marks applied to a right-to-left base character will at this
point precede their base character. If the rendering engine expects them to
follow the base characters in the final display process, then the ordering
of the marks and the base character must be reversed.*

****

What you are not seeing when you just look at the text is that what the bidi
algorithm actually produces is a series of levels associated with the text.
That level information is available at the time of L3 for the display engine
to use.

So what you need to do is make your test do L3 (not done by ICU, since it is
targeted at display layout). So you need to make one more pass through each
segment of text that is at an odd level, and reverse any sequence matching
the following regex:
/[:bc=NSM:]+ [:^bc=NSM:]/

=================

I haven't looked in detail at "If an R, AL or AN is present, no L may be
present.", and don't have the other rules handy here -- it may be redundant
with them. But just to restate the test case:

For the collision test, your test should be checking two environments:

a) RLM + test_string
b) LRM + test_string

You'll test a series of strings that cover all the combinations. There is a
collision failure if string1 has the same bidi results in either (a) or (b)
as string2 in either (a) or (b).

With that, if I have

test_string1 = Ab
test_string2 = bA

I will get:

test_string1 in (a) => bA
test_string2 in (b) => bA

thus a collision. I also get a collision with

test_string1 in (b) => Ab
test_string2 in (a) => Ab

As I said, though, I don't have the other rules handy here -- "If an R, AL
or AN is present, no L may be present." may be just redundant.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080206/952cf39e/attachment.html


More information about the Idna-update mailing list