Protocol Action: 'Right-to-left scripts for IDNA' to Proposed Standard

Mon Feb 15 15:50:10 CET 2010

please keep in mind that presentation and network order are, as you  
know, not always aligned and that the protocols have to fix network  
order absolutely for anything (including certificates) to work.

vint

On Feb 15, 2010, at 2:28 AM, Shawn Steele wrote:

> These types of things are why I think "we" (IETF or someone) needs  
> REAL usability testing, rather than merely the conjecture of some  
> (possibly biased) engineers.  I'm getting hints, from user feedback,  
> that what the users expect and some of the various ideas around BIDI  
> display of names don't align.  But those should be verified by  
> experts in usability testing (which isn't me).
>
> -Shawn
>
> ________________________________________
> From: Patrik Fältström [patrik at frobbit.se]
> Sent: Sunday, February 14, 2010 12:39 PM
> To: Vint Cerf
> Cc: Mark Davis ☕; Abdulrahman I. ALGhadir; Shawn Steele; Aharon  
> (Vladimir) Lanin; Michel Suignard; idna-update at alvestrand.no; Slim  
> Amamou
> Subject: Re: Protocol Action: 'Right-to-left scripts for IDNA' to  
> Proposed Standard
>
> [A note I wrote, and rewrote a few times, because people have sent  
> so much information on the mailing list ;-) ]
>
> I also would like to thank Mark for this summary.
>
> What we have to remember that the most important thing is to ensure  
> that we all agree on what the logical order is for the various  
> characters in a domain name. What order the characters are passed  
> around in a protocol where we need interoperability.
>
> We do have experience with issues with the byte order, and what is  
> described in the bidi document is that we do pass around all domain  
> names in one and only one order.
>
> The question is then how to display them.
>
> And if one mix the directionality (as Mark explain specifically  
> might happen for example in a URI), specifically in an overall  
> different directionality environment, weird things *WILL* happen.
>
> In 2008, I did some blog posts about this, with some examples that  
> might be interesting. Maybe not completely properly examples as I do  
> not know Arabic, but these things are tested on my Mac, in MacOSX,  
> and the images you see are screen dumps:
>
> http://stupid.domain.name/node/681
> http://stupid.domain.name/node/682
> http://stupid.domain.name/node/683
>
> In short, I do not think we know what "the best" solution is.
>
> Sad, but that is it.
>
> What we know though is that a domain name is a domain name. Not  
> text, or word, or a poem or such.
>
> A protocol parameter, and as such we will unfortunately see some  
> issues because the base is always that (as in this case that we have  
> one and only one logical order) we will have some constraints that  
> we do not know what to do about.
>
>   Patrik
>
> On 14 feb 2010, at 21.25, Vint Cerf wrote:
>
>> Mark,
>>
>> thanks for this - my sense is that almost anything we try to do is  
>> defeated in cut and paste scenarios where context may be lost.
>>
>> vint
>>
>>
>> On Feb 14, 2010, at 3:23 PM, Mark Davis ☕ wrote:
>>
>>> A few comments on remarks here:
>>>
>>>> Well as we know the IDNA protocol didn't adapt bidi algorithm  
>>>> (UAX #9) fully. They disallowed all bidi markers (LRM,RLM,...)  
>>>> which are they used to solve problems from this kind.
>>>
>>>> Well I don't think so it can be done in UAX#9 (well if URI has  
>>>> its own rules) the UAX#9 does know about the nature of characters  
>>>> (Neutral,RTL,LTR,week..) the context direction etc.. and thus  
>>>> there are possible ways to fix this issues in UAX#9 rather than  
>>>> IDNA itself.
>>>
>>> Changing UAX#9 (aka UBA) at this point would be very difficult,  
>>> because of stability concerns. We've seen before where very minor  
>>> changes to it have caused many problems for users, because it  
>>> changes the layout of existing documents. While not impossible,  
>>> one would have to make a very good case for the change, and be  
>>> prepared to demonstrate, with compelling data, that the benefit  
>>> would be worth the cost.
>>>
>>> The UBA was designed for plain text, not special syntax. And no  
>>> matter how it was structured, it was always clear that one would  
>>> need to be able to override the default; to that end, the marks  
>>> and overrides were added. Because those are disallowed in IDNA,  
>>> this tool is not available, however.  The reason to not allow  
>>> those in IDNA was because of the opportunity for constructing,  
>>> artificially, very confusable IRIs.
>>>
>>> (BTW Looking back at it, one of the problems with the UBA was that  
>>> it tried to do too much. There is a tension between heuristics and  
>>> predictability, and if we could go back in time and redo it, one  
>>> of the things I'd change would be to reduce the heuristics,  
>>> especially around numbers, so as to make it more predictable for  
>>> users.)
>>>
>>> However, it is possible and conformant to UBA to have a higher  
>>> level protocol that reorders labels in a domain name, and in the  
>>> path, and in the query, because it allows for such specialized  
>>> overrides specifically. So you could take the following internal  
>>> string with characters from left to right
>>>
>>> http://a.B.C.d/e/F/G/h?i=J&K=l&M=n&o=P
>>>
>>> and have them display
>>>
>>> ...F/e/d.C.B.a//:http
>>>
>>> This would be possible, but is not necessarily a good idea. The  
>>> problem comes in the interaction between those environments that  
>>> (a) look for IRIs and handle them this way, and (b) environments  
>>> that don't parse for IRIs, or don't recognize them or their  
>>> fragments, or don't display them in the 'new' way once they have  
>>> them. There is already the issue of display being different in RTL  
>>> vs LTR paragraphs; you don't want typing in one environment within  
>>> RTL to give yet different results than in another within RTL.
>>>
>>> And we know that recognizing IRIs (and fragments thereof)  
>>> occurring in plain text is difficult. You don't want  
>>> PAYPAL.JOE.com to appear as PAYPAL.JOE.com in my email, and  
>>> JOE.PAYPAL.com in the address bar, and so on.
>>>
>>> So any design for having a special ordering for IRI BIDI elements  
>>> has to take a host of issues into account. I'm not saying that it  
>>> can't be done, but it is a big job, and any transition has be be  
>>> extremely carefully considered. Various people in Unicode have  
>>> considered it at one time or another, but we've just never seen a  
>>> clear path forward.
>>>
>>> Mark
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update