Comments on bidi-04

Harald Tveit Alvestrand harald at alvestrand.no
Tue Mar 11 13:57:05 CET 2008



--On Monday, March 10, 2008 18:24:57 -0700 Mark Davis 
<mark.davis at icu-project.org> wrote:

> Harald,
>
> Shouting doesn't help. Supplying an actual test case where LRE makes a
> difference would help.
>
> You could well be right, but I'd simply like to see the test case to see
> what is going on, since it doesn't square with my understanding.

(one hour's Perl hacking later - nice way to start the morning)
It turns out that it does make a difference, but it does not matter....

For instance, the string AN EN, when embedded between 2 dots, will behave 
as follows in a LTR context:

<RLE><PDF>.<AN><EN>.<RLE><PDF> reorders into <AN><EN>.. (that is, both dots 
jump to the right end of the string). (I hope you agree with me that "sor" 
and "eor" will both be "R" for the run that goes .<AN><EN>., while the 
embedding direction is L).

This affects 3 of the 53 length-2 combinations allowed by IDNA2003, and 43 
of the 375 length-3 combinations allowed by IDNA2003.

But - ALL of the strings affected turn out to be eliminated by our 
currently proposed set of selection criteria. If a string passes the test 
for "safe BIDI label" in bidi-04, it will not be affected by bidi 
formatting codes in the text around it.

So we can eliminate the restriction.

                   Harald


More information about the Idna-update mailing list