Protocol Action: 'Right-to-left scripts for IDNA' to Proposed Standard

Sun Feb 14 21:25:56 CET 2010

Mark,

thanks for this - my sense is that almost anything we try to do is  
defeated in cut and paste scenarios where context may be lost.

vint

On Feb 14, 2010, at 3:23 PM, Mark Davis ☕ wrote:

> A few comments on remarks here:
>
> >Well as we know the IDNA protocol didn't adapt bidi algorithm (UAX  
> #9) fully. They disallowed all bidi markers (LRM,RLM,...) which are  
> they used to solve problems from this kind.
>
> > Well I don't think so it can be done in UAX#9 (well if URI has its  
> own rules) the UAX#9 does know about the nature of characters  
> (Neutral,RTL,LTR,week..) the context direction etc.. and thus there  
> are possible ways to fix this issues in UAX#9 rather than IDNA itself.
>
> Changing UAX#9 (aka UBA) at this point would be very difficult,  
> because of stability concerns. We've seen before where very minor  
> changes to it have caused many problems for users, because it  
> changes the layout of existing documents. While not impossible, one  
> would have to make a very good case for the change, and be prepared  
> to demonstrate, with compelling data, that the benefit would be  
> worth the cost.
>
> The UBA was designed for plain text, not special syntax. And no  
> matter how it was structured, it was always clear that one would  
> need to be able to override the default; to that end, the marks and  
> overrides were added. Because those are disallowed in IDNA, this  
> tool is not available, however.  The reason to not allow those in  
> IDNA was because of the opportunity for constructing, artificially,  
> very confusable IRIs.
>
> (BTW Looking back at it, one of the problems with the UBA was that  
> it tried to do too much. There is a tension between heuristics and  
> predictability, and if we could go back in time and redo it, one of  
> the things I'd change would be to reduce the heuristics, especially  
> around numbers, so as to make it more predictable for users.)
>
> However, it is possible and conformant to UBA to have a higher level  
> protocol that reorders labels in a domain name, and in the path, and  
> in the query, because it allows for such specialized overrides  
> specifically. So you could take the following internal string with  
> characters from left to right
>
> http://a.B.C.d/e/F/G/h?i=J&K=l&M=n&o=P
>
> and have them display
>
> ...F/e/d.C.B.a//:http
>
> This would be possible, but is not necessarily a good idea. The  
> problem comes in the interaction between those environments that (a)  
> look for IRIs and handle them this way, and (b) environments that  
> don't parse for IRIs, or don't recognize them or their fragments, or  
> don't display them in the 'new' way once they have them. There is  
> already the issue of display being different in RTL vs LTR  
> paragraphs; you don't want typing in one environment within RTL to  
> give yet different results than in another within RTL.
>
> And we know that recognizing IRIs (and fragments thereof) occurring  
> in plain text is difficult. You don't want PAYPAL.JOE.com to appear  
> as PAYPAL.JOE.com in my email, and JOE.PAYPAL.com in the address  
> bar, and so on.
>
> So any design for having a special ordering for IRI BIDI elements  
> has to take a host of issues into account. I'm not saying that it  
> can't be done, but it is a big job, and any transition has be be  
> extremely carefully considered. Various people in Unicode have  
> considered it at one time or another, but we've just never seen a  
> clear path forward.
>
> Mark
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20100214/b1379ee2/attachment-0001.htm