Request for updated example highlighting problem of mixing of AN and EN

Thu Aug 20 01:16:12 CEST 2009

I misread your email, thanks for the clarification.

James

> -----Original Message-----
> From: Alireza Saleh [mailto:saleh at nic.ir]
> Sent: Wednesday, 19 August 2009 9:27 PM
> To: James Mitchell
> Cc: idna-update at alvestrand.no
> Subject: Re: Request for updated example highlighting problem of mixing of AN
> and EN
> 
> Look at this
> 
> http://unicode.org/cldr/utility/bidi.jsp?a=.%D7%941%D9%A1-1.%0D%0A.%D7%941-
> 1%D9%A1.&p=LTR&hack=on
> 
> Alireza
> 
> James Mitchell wrote:
> > With that example I get..
> >
> > Bidi Class: CS  R EN AN ES EN CS
> > Resolved:    e  R EN AN ON EN  e
> > Level:          1  2  2  1  2
> > Display:    ON EN ES EN AN  R ON
> >
> > My concern is the online tool provided by Unicode
> > [http://unicode.org/cldr/utility/bidi.jsp?a=.A19-1.&p=LTR&hack=on]
> > rearranged the characters to something other than your example.  This label
> > appears fine to me; I believe it satisfies the label uniqueness test,
> > remembering a label cannot begin with an EN.  Note that switching the R for
> > an AL yields the same display order.
> >
> > I do not understand how rule N1 was unclear, but everyone is different.  I
> > was not aware of inconsistency in this case among applications, however
> believe
> > this is a moot point; we should not be designing this protocol to work
> around
> > problems in applications.
> >
> > So what is the issue here?
> >
> > James
> >
> >
> >> -----Original Message-----
> >> From: Alireza Saleh [mailto:saleh at nic.ir]
> >> Sent: Tuesday, 18 August 2009 7:54 PM
> >> To: James Mitchell
> >> Cc: idna-update at alvestrand.no
> >> Subject: Re: Request for updated example highlighting problem of mixing of
> AN
> >> and EN
> >>
> >> There is at least one example which has been sent by Harald that is, "
> >> CS R EN AN ES EN CS (.<alef><latin 1><arabic 1>-<latin 1>.) will
> >> rearrange into the same sequence as CS R EN ES EN AN CS (.<alef><latin
> >> 1>-<latin 1><arabic 1>.) "
> >>
> >> The specifications of the rule N1 of UAX#9 is not so clear and this
> >> causes some some inconsistency among the different applications
> >> implementing this rule. This has been reported to Unicode and at that
> >> time I believed by well interpreting the N1 rule and having
> >> clarification examples there is nothing to be worried about by mixing AN
> >> and EN, I think the current change draft of UAX#9 is trying to fix the
> >> bug according to the implementations and not  interpreting the text
> >> correctly however we can implement the W2 rule of UAX#9  which says :
> >> ' W2. Search backward from each instance of a European number until the
> >> first strong type (R, L, AL, or sor) is found. If an AL is found, change
> >> the type of the European number to Arabic number.' or simply we can say
> >> by having no R in the bidi label we can mix AN and EN.
> >>
> >> The UAX#31 has been implemented for using ZWNJ in Arabic-Script.
> >>
> >> Alireza
> >>
> >>
> >> Thank you, Erik!
> >> James Mitchell wrote:
> >>
> >>> The only concrete example I have found that justifies the prohibition of
> >>> mixing AN and EN is CS EN AN CS R in an LTR context.
> >>> [http://www.alvestrand.no/pipermail/idna-update/2008-January/000858.html]
> >>>
> >>> The current bidi rules, plus changes from a subsequent email from Mark, an
> >>> AN will require the label to be treated as RTL
> >>> [http://www.alvestrand.no/pipermail/idna-update/2009-August/005153.html].
> >>> Therefore, a label mixing AN and EN will be treated as an RTL label.  The
> >>> above example (EN AN) will violate the first bidi rule, that label must
> >>> begin with L, R or AL.
> >>>
> >>> Is there a concrete example that is otherwise IDNA-valid?
> >>>
> >>> From my understanding of the bidirectional algorithm and the current bidi
> >>> rules, there is no otherwise covered case where mixing AN and EN leads to
> a
> >>> label that violates the requirements (as distinct from the rules) of bidi.
> >>>
> >> As
> >>
> >>> stated earlier, a label containing an AN is an RTL label.  An RTL label
> must
> >>> start with an AL or R (rule 1) and must contain only R, AL, AN, EN, ES,
> CS,
> >>> ET, ON, BN or NSM.  Note that the only strong characters in this label are
> >>> AL and R (L is not allowed and sor is excluded because the first character
> >>> must be AL or R).  Given that, no EN can resolve to an L
> >>> [http://unicode.org/reports/tr9/#W7], therefore all AN and EN will resolve
> >>> to the same levels.
> >>>
> >>> Or perhaps I am missing something?
> >>>
> >>> James Mitchell
> >>> _______________________________________________
> >>> Idna-update mailing list
> >>> Idna-update at alvestrand.no
> >>> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>>
> >>>
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >