Comments on IDNA Bidi
michelsu at windows.microsoft.com
Wed Jan 16 07:08:25 CET 2008
> From: Harald Alvestrand [mailto:harald at alvestrand.no]
> Sent: Tuesday, January 15, 2008 12:06 AM
> The investigations I've done so far tell me that the IDNA2003 rules are
> are far too weak, especially when it comes to the handling of AN
> (Arabic Number) - there are MANY strings that people will want to have
> valid in some context that will break in "interesting" ways when the
> Unicode BIDI algorithm is applied to a paragraph containing such a
> string used in the ways we usually use a domain name. So in addition to
> allowing trailing NSM, we need to tighten up the rules in other ways.
> I'm not going to recommend any specific ruleset until I have some
> confidence that this ruleset actually specifies a set of strings that
> is both "safe" (passes some reasonable set of tests) and useful.
My take on this is that if something breaks in 'interesting' ways we can't allow it in a domain, this is why we have now encapsulation requirement with RCat to avoid interaction between labels across weak separators and not so funny side effects with the AN characters. So in the balance between useful and safe we always have to pick the safe path.
I am not sure to understand fully why you think that the current are far too weak. As often they are a compromise and were crafted/reviewed by bidi experts such as Matti Allouche, Jonathan Rosenne, Martin Duerst and others. They are also used in the IRI RFC and have even more detailed considerations there. Interesting to note that an update in idna will probably require a similar one in IRI because the problems are very similar between a domain name and a resource identifier (mix of strong and weak bidi types). It is true that the scenario involving ending combining marks (NSM) was missed because probably they were no Yiddish and Dhivehi speakers in the reviewer list at that time.
In fact I think that as of now the bidi rules are almost 'too' tight and are limiting the usage side which is unfortunate but necessary for security reasons. So I am not sure how we could make them even tighter. But w/o concrete cases I am probably speculating.
This is why I am favor in minor but essential changes to the rules that preserve mostly what we have now. And it was my belief that the current rules had been reasonably validated with few exceptions such as the case of ending combining marks.
More information about the Idna-update