Comments on IDNA Bidi

Wed Jan 16 12:27:19 CET 2008

I think providing specific examples will be very useful for each  
instance of a "problem" that we encounter. It will help focus  
attention and aid understanding.

vint

On Jan 16, 2008, at 1:08 AM, Michel Suignard wrote:

>> From: Harald Alvestrand [mailto:harald at alvestrand.no]
>> Sent: Tuesday, January 15, 2008 12:06 AM
>>
>> The investigations I've done so far tell me that the IDNA2003  
>> rules are
>> are far too weak, especially when it comes to the handling of AN
>> (Arabic Number) - there are MANY strings that people will want to  
>> have
>> valid in some context that will break in "interesting" ways when the
>> Unicode BIDI algorithm is applied to a paragraph containing such a
>> string used in the ways we usually use a domain name. So in  
>> addition to
>> allowing trailing NSM, we need to tighten up the rules in other ways.
>>
>> I'm not going to recommend any specific ruleset until I have some
>> confidence that this ruleset actually specifies a set of strings that
>> is both "safe" (passes some reasonable set of tests) and useful.
>
> Harald,
> My take on this is that if something breaks in 'interesting' ways  
> we can't allow it in a domain, this is why we have now  
> encapsulation requirement with RCat to avoid interaction between  
> labels across weak separators and not so funny side effects with  
> the AN characters. So in the balance between useful and safe we  
> always have to pick the safe path.
>
> I am not sure to understand fully why you think that the current  
> are far too weak. As often they are a compromise and were crafted/ 
> reviewed by bidi experts such as Matti Allouche, Jonathan Rosenne,  
> Martin Duerst and others. They are also used in the IRI RFC and  
> have even more detailed considerations there. Interesting to note  
> that an update in idna will probably require a similar one in IRI  
> because the problems are very similar between a domain name and a  
> resource identifier (mix of strong and weak bidi types). It is true  
> that the scenario involving ending combining marks (NSM) was missed  
> because probably they were no Yiddish and Dhivehi speakers in the  
> reviewer list at that time.
>
> In fact I think that as of now the bidi rules are almost 'too'  
> tight and are limiting the usage side which is unfortunate but  
> necessary for security reasons. So I am not sure how we could make  
> them even tighter. But w/o concrete cases I am probably speculating.
>
> This is why I am favor in minor but essential changes to the rules  
> that preserve mostly what we have now. And it was my belief that  
> the current rules had been reasonably validated with few exceptions  
> such as the case of ending combining marks.
>
> Michel
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update