Request for updated example highlighting problem of mixing of AN and EN

Alireza Saleh saleh at nic.ir
Tue Aug 18 11:54:14 CEST 2009


There is at least one example which has been sent by Harald that is, " 
CS R EN AN ES EN CS (.<alef><latin 1><arabic 1>-<latin 1>.) will 
rearrange into the same sequence as CS R EN ES EN AN CS (.<alef><latin 
1>-<latin 1><arabic 1>.) "

The specifications of the rule N1 of UAX#9 is not so clear and this 
causes some some inconsistency among the different applications 
implementing this rule. This has been reported to Unicode and at that 
time I believed by well interpreting the N1 rule and having 
clarification examples there is nothing to be worried about by mixing AN 
and EN, I think the current change draft of UAX#9 is trying to fix the 
bug according to the implementations and not  interpreting the text 
correctly however we can implement the W2 rule of UAX#9  which says :
' W2. Search backward from each instance of a European number until the 
first strong type (R, L, AL, or sor) is found. If an AL is found, change 
the type of the European number to Arabic number.' or simply we can say 
by having no R in the bidi label we can mix AN and EN.

The UAX#31 has been implemented for using ZWNJ in Arabic-Script.

Alireza


Thank you, Erik!
James Mitchell wrote:
> The only concrete example I have found that justifies the prohibition of
> mixing AN and EN is CS EN AN CS R in an LTR context.
> [http://www.alvestrand.no/pipermail/idna-update/2008-January/000858.html]
>
> The current bidi rules, plus changes from a subsequent email from Mark, an
> AN will require the label to be treated as RTL
> [http://www.alvestrand.no/pipermail/idna-update/2009-August/005153.html].
> Therefore, a label mixing AN and EN will be treated as an RTL label.  The
> above example (EN AN) will violate the first bidi rule, that label must
> begin with L, R or AL.
>
> Is there a concrete example that is otherwise IDNA-valid?
>
> From my understanding of the bidirectional algorithm and the current bidi
> rules, there is no otherwise covered case where mixing AN and EN leads to a
> label that violates the requirements (as distinct from the rules) of bidi.  As
> stated earlier, a label containing an AN is an RTL label.  An RTL label must
> start with an AL or R (rule 1) and must contain only R, AL, AN, EN, ES, CS,
> ET, ON, BN or NSM.  Note that the only strong characters in this label are
> AL and R (L is not allowed and sor is excluded because the first character
> must be AL or R).  Given that, no EN can resolve to an L
> [http://unicode.org/reports/tr9/#W7], therefore all AN and EN will resolve
> to the same levels.
>
> Or perhaps I am missing something?
>
> James Mitchell
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>   



More information about the Idna-update mailing list