Request for updated example highlighting problem of mixing of AN and EN
saleh at nic.ir
Tue Aug 18 11:54:14 CEST 2009
There is at least one example which has been sent by Harald that is, "
CS R EN AN ES EN CS (.<alef><latin 1><arabic 1>-<latin 1>.) will
rearrange into the same sequence as CS R EN ES EN AN CS (.<alef><latin
1>-<latin 1><arabic 1>.) "
The specifications of the rule N1 of UAX#9 is not so clear and this
causes some some inconsistency among the different applications
implementing this rule. This has been reported to Unicode and at that
time I believed by well interpreting the N1 rule and having
clarification examples there is nothing to be worried about by mixing AN
and EN, I think the current change draft of UAX#9 is trying to fix the
bug according to the implementations and not interpreting the text
correctly however we can implement the W2 rule of UAX#9 which says :
' W2. Search backward from each instance of a European number until the
first strong type (R, L, AL, or sor) is found. If an AL is found, change
the type of the European number to Arabic number.' or simply we can say
by having no R in the bidi label we can mix AN and EN.
The UAX#31 has been implemented for using ZWNJ in Arabic-Script.
Thank you, Erik!
James Mitchell wrote:
> The only concrete example I have found that justifies the prohibition of
> mixing AN and EN is CS EN AN CS R in an LTR context.
> The current bidi rules, plus changes from a subsequent email from Mark, an
> AN will require the label to be treated as RTL
> Therefore, a label mixing AN and EN will be treated as an RTL label. The
> above example (EN AN) will violate the first bidi rule, that label must
> begin with L, R or AL.
> Is there a concrete example that is otherwise IDNA-valid?
> From my understanding of the bidirectional algorithm and the current bidi
> rules, there is no otherwise covered case where mixing AN and EN leads to a
> label that violates the requirements (as distinct from the rules) of bidi. As
> stated earlier, a label containing an AN is an RTL label. An RTL label must
> start with an AL or R (rule 1) and must contain only R, AL, AN, EN, ES, CS,
> ET, ON, BN or NSM. Note that the only strong characters in this label are
> AL and R (L is not allowed and sor is excluded because the first character
> must be AL or R). Given that, no EN can resolve to an L
> [http://unicode.org/reports/tr9/#W7], therefore all AN and EN will resolve
> to the same levels.
> Or perhaps I am missing something?
> James Mitchell
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update