Comments on IDNA Bidi
vint at google.com
Wed Jan 16 12:27:19 CET 2008
I think providing specific examples will be very useful for each
instance of a "problem" that we encounter. It will help focus
attention and aid understanding.
On Jan 16, 2008, at 1:08 AM, Michel Suignard wrote:
>> From: Harald Alvestrand [mailto:harald at alvestrand.no]
>> Sent: Tuesday, January 15, 2008 12:06 AM
>> The investigations I've done so far tell me that the IDNA2003
>> rules are
>> are far too weak, especially when it comes to the handling of AN
>> (Arabic Number) - there are MANY strings that people will want to
>> valid in some context that will break in "interesting" ways when the
>> Unicode BIDI algorithm is applied to a paragraph containing such a
>> string used in the ways we usually use a domain name. So in
>> addition to
>> allowing trailing NSM, we need to tighten up the rules in other ways.
>> I'm not going to recommend any specific ruleset until I have some
>> confidence that this ruleset actually specifies a set of strings that
>> is both "safe" (passes some reasonable set of tests) and useful.
> My take on this is that if something breaks in 'interesting' ways
> we can't allow it in a domain, this is why we have now
> encapsulation requirement with RCat to avoid interaction between
> labels across weak separators and not so funny side effects with
> the AN characters. So in the balance between useful and safe we
> always have to pick the safe path.
> I am not sure to understand fully why you think that the current
> are far too weak. As often they are a compromise and were crafted/
> reviewed by bidi experts such as Matti Allouche, Jonathan Rosenne,
> Martin Duerst and others. They are also used in the IRI RFC and
> have even more detailed considerations there. Interesting to note
> that an update in idna will probably require a similar one in IRI
> because the problems are very similar between a domain name and a
> resource identifier (mix of strong and weak bidi types). It is true
> that the scenario involving ending combining marks (NSM) was missed
> because probably they were no Yiddish and Dhivehi speakers in the
> reviewer list at that time.
> In fact I think that as of now the bidi rules are almost 'too'
> tight and are limiting the usage side which is unfortunate but
> necessary for security reasons. So I am not sure how we could make
> them even tighter. But w/o concrete cases I am probably speculating.
> This is why I am favor in minor but essential changes to the rules
> that preserve mostly what we have now. And it was my belief that
> the current rules had been reasonably validated with few exceptions
> such as the case of ending combining marks.
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update