bidi spec

Erik van der Poel erikv at google.com
Thu Feb 7 03:50:33 CET 2008


OK, I have tested the "remain grouped" property up to 8 characters
using a single machine, and I have tested the "no two labels display
the same" property up to 6 characters (with a dot on either side of
it), and found that the following rules can be removed:

If an R, AL or AN is present, no L may be present.
If an AN is present, no EN may be present
If an AN is present, at least one R or AL must be present

Note that the 2nd one above is the same as:

If an EN is present, no AN may be present

so the former is redundant and can be removed.

However, I found that I have to add the following rule in order to
satisfy the "no two labels display the same" property:

If there is an EN or AN present, there may not be an NSM

Without this rule, we get the following behavior under ltr:

R NSM EN -> EN NSM R
R EN NSM -> EN NSM R

Does this make sense?

By the way, I removed BN from the test because I used ICU, which does
somewhat odd things with it, and the bidi algorithm says that BN can
be placed wherever you like (or something like that). I hope this can
be considered a fair test.

Erik

On Feb 5, 2008 5:38 PM, Erik van der Poel <erikv at google.com> wrote:
> In the bidi spec it says:
>
> "if 1 or more AN are allowed alone, AL AN, when put next to AN, is unstable"
>
> However, when I try it in ICU's implementation of the bidi algorithm,
> I get the following:
>
> LTR:
> in 0660 002E 0627 0661 002E
> out 0661 0627 002E 0660 002E
>
> RTL:
> in 0660 002E 0627 0661 002E
> out 002E 0661 0627 002E 0660
>
> where "in" is logical, "out" is visual. Note that there are 5
> characters, and there is a dot at the end, so the final label is empty
> (label "D" in the spec).
>
> If you are not seeing the above results with your implementation of
> the bidi algorithm, then it is not consistent with ICU's, and I'd like
> to find out where the problem is.
>
> Erik
>


More information about the Idna-update mailing list