Protocol Action: 'Right-to-left scripts for IDNA' to Proposed Standard

Vint Cerf vint at
Sun Feb 14 21:25:56 CET 2010


thanks for this - my sense is that almost anything we try to do is  
defeated in cut and paste scenarios where context may be lost.


On Feb 14, 2010, at 3:23 PM, Mark Davis ☕ wrote:

> A few comments on remarks here:
> >Well as we know the IDNA protocol didn't adapt bidi algorithm (UAX  
> #9) fully. They disallowed all bidi markers (LRM,RLM,...) which are  
> they used to solve problems from this kind.
> > Well I don't think so it can be done in UAX#9 (well if URI has its  
> own rules) the UAX#9 does know about the nature of characters  
> (Neutral,RTL,LTR,week..) the context direction etc.. and thus there  
> are possible ways to fix this issues in UAX#9 rather than IDNA itself.
> Changing UAX#9 (aka UBA) at this point would be very difficult,  
> because of stability concerns. We've seen before where very minor  
> changes to it have caused many problems for users, because it  
> changes the layout of existing documents. While not impossible, one  
> would have to make a very good case for the change, and be prepared  
> to demonstrate, with compelling data, that the benefit would be  
> worth the cost.
> The UBA was designed for plain text, not special syntax. And no  
> matter how it was structured, it was always clear that one would  
> need to be able to override the default; to that end, the marks and  
> overrides were added. Because those are disallowed in IDNA, this  
> tool is not available, however.  The reason to not allow those in  
> IDNA was because of the opportunity for constructing, artificially,  
> very confusable IRIs.
> (BTW Looking back at it, one of the problems with the UBA was that  
> it tried to do too much. There is a tension between heuristics and  
> predictability, and if we could go back in time and redo it, one of  
> the things I'd change would be to reduce the heuristics, especially  
> around numbers, so as to make it more predictable for users.)
> However, it is possible and conformant to UBA to have a higher level  
> protocol that reorders labels in a domain name, and in the path, and  
> in the query, because it allows for such specialized overrides  
> specifically. So you could take the following internal string with  
> characters from left to right
> http://a.B.C.d/e/F/G/h?i=J&K=l&M=n&o=P
> and have them display
> ...F/e/d.C.B.a//:http
> This would be possible, but is not necessarily a good idea. The  
> problem comes in the interaction between those environments that (a)  
> look for IRIs and handle them this way, and (b) environments that  
> don't parse for IRIs, or don't recognize them or their fragments, or  
> don't display them in the 'new' way once they have them. There is  
> already the issue of display being different in RTL vs LTR  
> paragraphs; you don't want typing in one environment within RTL to  
> give yet different results than in another within RTL.
> And we know that recognizing IRIs (and fragments thereof) occurring  
> in plain text is difficult. You don't want to appear  
> as in my email, and in the address  
> bar, and so on.
> So any design for having a special ordering for IRI BIDI elements  
> has to take a host of issues into account. I'm not saying that it  
> can't be done, but it is a big job, and any transition has be be  
> extremely carefully considered. Various people in Unicode have  
> considered it at one time or another, but we've just never seen a  
> clear path forward.
> Mark
> _______________________________________________
> Idna-update mailing list
> Idna-update at

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list