Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))

Tue Dec 16 06:02:51 CET 2008

It is hard to tell from your code, since it depends on what the evaluation
of some of the subfunctions would yield.

I'll list instead a four test cases that you can check. I hasten to add that
I haven't absolutely checked these yet.

AN N  AN → AN R  AN
R AN N  EN → R AN R  EN
R EN N  AN → R EN R  AN
R EN N  EN → R EN R  EN

That is, the N in each of these cases would change to R.

The first two rules are when you have a neutral between two Arabic-Indic
digits. For example, if you have U+0668 + "!?" + U+0669, then the display
ordering of those three should be
U+0669 ! ? U+0668.
That is, ٨ followed by ! followed by ٩ should appear from right to left. In
my emailer this works. From left to right I see the Arabic 9, then ? then !,
then the Arabic 8.

٨!?٩

The latter two rules are in effect if an EN remains in the text, eg if an
English number follows an Arabic letter and W7 has not been evoked. The , a
‎ج‎ followed by 8 followed by ! followed by 9 should all appear RTL.

‎ج‎8!?9

That case fails in my emailer.

*Background:*

Here is the text: http://unicode.org/reports/tr9/#N1

The issue is that the text says that AN and EN act like R, and then has a
set of rules. Those rules don't explicitly list all of the combinations of
R, EN, AN on both sides of an N. That would add a 4 more rules, those added
in yellow below.

L  N  L  → L  L  L
R  N  R  → R  R  R
R  N  AN → R  R  AN
R  N  EN → R  R  EN
AN N  R  → AN R  R
AN N  AN → AN R  AN
AN N  EN → AN R  EN
EN N  R  → EN R  R
EN N  AN → EN R  AN
EN N  EN → EN R  EN

 If someone interpreted the rules as being complete, then they would neglect
to change neutrals into R in those 4 cases.

Mark

On Mon, Dec 15, 2008 at 11:39, Harald Alvestrand <harald at alvestrand.no>wrote:

> Mark Davis wrote:
> > Let me try to shed some light on this. In the Unicode bidi
> > subcommittee, there are four different items that have recently come
> > up regarding BIDI.
> >
> > The first is just some editorial clarifying text, and has already been
> > discussed and approved by the UTC. This is not relevant to IDNA.
> >
> > The others were too recent to have been considered by the UTC.
> >
> > The second is regarding overriding mirroring for archaic scripts. This
> > is not relevant to IDNA.
> >
> > The third only applies to the embedding/overriding codes, which are
> > not allowed in IDNs: RLE, LRE, RLO, LRO, and PDF. This is not relevant
> > to IDNA. /
> >
> > /The last is relevant, and came up most recently. It is the following:
> >
> > There is reasonable disagreement about what the meaning of a
> > particular rule (N1) is, with two possible interpretations. We know
> > the intent of the author, but the intent of the author is outweighed
> > by what the common practice is. That is, the UTC needs to be quite
> > conservative about changes to BIDI, and existing practice is the major
> > consideration. That requires determining, however, what the prevaling
> > practice is, so we're investigating that now.
> >
> > The practical impact for IDNA is, I think, the following.
> >
> > 1. As a part of investigating the common practice, we need to consider
> > whether we need to add additional constraints to what Harald has
> > devised. I see two possible approaches:
> >
> >    1. We can wait until the investigation is competed, and accomodate
> >       the results;
> >    2. Alternatively, we can add constraints (if need be) that
> >       accomplish the goal no matter which of the two interpretations
> >       of N1 is being used.
> >
> >
> > 2. We should add to the security considerations for bidi some
> > indication of the fact that while the bidi constraints are intended to
> > ensure "Character Grouping" and "Label Uniqueness" as much as
> > possible, they may not do so for certain cases:
> >
> >    1. If the label is adjacent to all-ASCII labels (the xxx.3com
> problem).
> >    2. If the particular implementation of the bidi algorithm deviates
> >       from the standard.
> >
> Mark,
>
> what are the 2 interpretations?
>
> FWIW, here's my (horribly inefficient) interpretation of N1:
>
>    # 3.3.4 Resolving neutral types.
>    # N1. A sequence of neutrals takes the direction of the surrounding...
>    for my $ix (1.. at typelist) {
>        if (!has_direction($typelist[$ix])) {
>            # find directional in the forward direction
>            for my $ix2 ($ix+1.. at typelist) {
>                if (has_direction($typelist[$ix2])) {
>                    if (effective_direction($typelist[$ix2])
>                        eq effective_direction($typelist[$ix-1])) {
>                        $typelist[$ix] =
> effective_direction($typelist[$ix-1]);
>                    }
>                    last;
>                }
>            }
>        }
>    }
>
> I don't know what that counts as.
>
>                       Harald
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081215/ce23bef7/attachment.htm