Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 and Rationale-06))

Harald Tveit Alvestrand harald at alvestrand.no
Tue Dec 16 07:16:11 CET 2008


Mark Davis skrev:
> It is hard to tell from your code, since it depends on what the 
> evaluation of some of the subfunctions would yield.
>
> I'll list instead a four test cases that you can check. I hasten to 
> add that I haven't absolutely checked these yet.
> AN N  AN → AN R  AN
> R AN N  EN → R AN R  EN
> R EN N  AN → R EN R  AN
>
> R EN N  EN → R EN R  EN
>   
> That is, the N in each of these cases would change to R.
>
> The first two rules are when you have a neutral between two 
> Arabic-Indic digits. For example, if you have U+0668 + "!?" + U+0669, 
> then the display ordering of those three should be
> U+0669 ! ? U+0668.
> That is, ٨ followed by ! followed by ٩ should appear from right to 
> left. In my emailer this works. From left to right I see the Arabic 9, 
> then ? then !, then the Arabic 8.
>
> ٨!?٩
>
> The latter two rules are in effect if an EN remains in the text, eg if 
> an English number follows an Arabic letter and W7 has not been evoked. 
> The , a ‎ج‎ followed by 8 followed by ! followed by 9 should all 
> appear RTL.
>
> ‎ج‎8!?9
>
> That case fails in my emailer.
>
> *Background:*
>
> Here is the text: http://unicode.org/reports/tr9/#N1
>
> The issue is that the text says that AN and EN act like R, and then 
> has a set of rules. Those rules don't explicitly list all of the 
> combinations of R, EN, AN on both sides of an N. That would add a 4 
> more rules, those added in yellow below.
>
>     L  N  L  → L  L  L
>     R  N  R  → R  R  R
>     R  N  AN → R  R  AN
>     R  N  EN → R  R  EN
>     AN N  R  → AN R  R
>     AN N  AN → AN R  AN
>
>     AN N  EN → AN R  EN
>     EN N  R  → EN R  R
>     EN N  AN → EN R  AN
>
>     EN N  EN → EN R  EN
>         
>
> If someone interpreted the rules as being complete, then they would 
> neglect to change neutrals into R in those 4 cases.
I interpreted the 4 examples given as test cases I could verify against, 
and implemented what the text says. So my code "passed" - that is, all 
of those strings were displayed right-to-left, even when the embedding 
direction was L.

                       Harald



More information about the Idna-update mailing list