bidi spec

Harald Alvestrand harald at alvestrand.no
Thu Feb 7 07:27:35 CET 2008


Erik van der Poel skrev:
> It turns out that ICU4C has an option called
> UBIDI_KEEP_BASE_COMBINING, and this did the trick (putting the base
> character and combining mark in the "correct" order). When I did that,
> I found that I no longer needed the new rule that I mentioned in my
> previous email. You're also right that "If an R, AL or AN is present,
> no L may be present" is simply redundant (same as the previous rule).
> So here are all the rules, with "Keep" and "Remove" indicated:
>
> Keep:
> Only characters with the BIDI properties L, R, AL, EN, ES, BN, ON
> and NSM are allowed.
>
> Keep:
> ES and ON are not allowed in the first position
>
> Keep:
> ES and ON, followed by zero or more NSM, is not allowed in the
> last position
>
> Keep:
> If an L is present, no R, AL or AN may be present.
>
> Remove (redundant):
> If an R, AL or AN is present, no L may be present.
>
> Keep:
> If an EN is present, no AN may be present
>
> Remove (redundant):
> If an AN is present, no EN may be present
>
> Remove:
> If an AN is present, at least one R or AL must be present
>
> Keep:
> The first character may not be an NSM
>
> Keep:
> The first character may not be an EN (European Number) or an AN
> (Arabic Number).
>
> Harald, are you also able to remove the rules that I marked "Remove"
> above, and still have the tests pass?
Works for up to 4 characters, with 0-1-character surrounding labels,
first try.
Excellent!

              Harald



More information about the Idna-update mailing list