Comments on IDNA Bidi

Tue Jan 15 09:06:09 CET 2008

Michel Suignard skrev:
> Just to remember that the bidi rules that I proposed in my message dated 11/30/2007 to this list (excerpt below with some further editing) do not require to implement the bidi algorithm but only relies in bidi properties and some positional conditions, and are much simpler to define and implement than the bidi algorithm itself. As such there are a mere update of the rules expressed in clause 6 of RFC 3454 (stringprep) and can be used as processing rule in the idna200x protocol definition. I understand that validation of these rules as appropriate may imply to run the bidi algorithm on strings complying with these rules, but that is only a validation or proof of concept issue, not an implementation issue.
>
> The rules I wrote then were also implying that the bidi control characters which were excluded in idna2003 were still excluded in this new context.
I think I agree with you. A set of properties that can be checked
without running the Bidi algorithm is much better than a set of
properties that require running the Bidi algorithm.

The investigations I've done so far tell me that the IDNA2003 rules are
far too weak, especially when it comes to the handling of AN (Arabic
Number) - there are MANY strings that people will want to have valid in
some context that will break in "interesting" ways when the Unicode BIDI
algorithm is applied to a paragraph containing such a string used in the
ways we usually use a domain name. So in addition to allowing trailing
NSM, we need to tighten up the rules in other ways.

I'm not going to recommend any specific ruleset until I have some
confidence that this ruleset actually specifies a set of strings that is
both "safe" (passes some reasonable set of tests) and useful.

                          Harald