Comments on IDNA Bidi

Mark Davis mark.davis at
Tue Jan 15 01:17:59 CET 2008

I think what Harald is trying to do at multiple levels:

   1. specify the mechanism
   2. determine and describe the requisite properties for the mechanism
   (eg that characters don't "hop" between different labels)
   3. describe how to mechanically verify that the mechanism has the
   requisite properties.

The last two parts might be a bit more complicated, but very few people
would actually need to be concerned with them. And only (2) would require
use of the bidi algorithm.

Once we have #2 and #3, then we can verify that Michel's method and/or
Harald's method works; tweak if necessary; and go with the simplest


On Jan 14, 2008 2:40 PM, Kenneth Whistler <kenw at> wrote:

> Michel said:
> > Just to remember that the bidi rules that I proposed in my message
> > dated 11/30/2007 to this list (excerpt below with some further
> > editing) do not require to implement the bidi algorithm but only
> > relies in bidi properties and some positional conditions, and are
> > much simpler to define and implement than the bidi algorithm itself.
> > As such there are a mere update of the rules expressed in clause
> > 6 of RFC 3454 (stringprep) and can be used as processing rule in
> > the idna200x protocol definition.
> I want to second Michel's approach here.
> If you look at the current text of bidi-02.txt, the
> proposed fix for RFC 3454 is to rewrite the definition of
> RandALCat character and LCat character as following:
>  For characters that have category "R", "AL" or "L", the
>  category is fixed (UAX#9 defines them as having "strong"
>  category);...
> Note that that much is unchanged in Michel's textual approach.
>  ... for characters in category EN, ES, ET, AN, CS, NSM, BN, B,
>  S, WS and ON, the category is determined by applying the
>  algorithm described in UAX#9 section 3.3 to the string.
> But here, Michel's approach is much simpler. It focusses on
> the main problem noted in RFC 3454, the problem of not
> allowing labels to end with combining marks -- a problem
> that was disallowing well-formed Dhivehi and Yiddish labels,
> for example. That is also the main problem discussed
> in Section 1 (and exemplified in Section 2) of bidi-02.txt.
> Since the categorical treatment of bc=NSM characters is
> trivial in the bidirectional algorithm, and doesn't imply
> full application of the algorithm to understand and specify
> it, simply adding the definition of NSMCat characters, and
> tightening up the specification of allowable label strings,
> to include the appropriate use of the NSMCat values, is
> much, much simpler than requiring an actual application
> of the full bidirectional algorithm to determine the
> final, contextual resolution of weak types (X3.3.3) and
> neutral types (X3.3.4).
> Also, as somewhat of an aside, the current proposed wording
> above in bidi-02.txt is overly broad in what it attempts
> to accomplish, even if retained. bc=BN, while a weak
> bidi type, is never resolved to a strong type by X3.3.3;
> by rule X9 all BN codes are logically removed from the
> string *before* any resolution of weak types. And bc=B,
> bc=S, and bc=WS can never occur in IDN labels in the
> first place, so you don't have to deal with the complications
> of rule L1 for those, either. And finally, bc=AN never
> gets resolved to one of the strong types. So at the very least, the
> statement could be simplified to:
>  ... for characters in category EN, ES, ET, CS, NSM,
>  and ON, the category is determined by applying the
>  algorithm described in UAX#9 section 3.3 to the string.
> But I think Michel's focus on just dealing with bc=NSM is
> much cleaner and still suffices to deal with the problem.
> --Ken
> _______________________________________________
> Idna-update mailing list
> Idna-update at

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list