Protocol Action: 'Right-to-left scripts for IDNA' to Proposed Standard
Vint Cerf
vint at google.com
Sun Feb 14 21:25:56 CET 2010
Mark,
thanks for this - my sense is that almost anything we try to do is
defeated in cut and paste scenarios where context may be lost.
vint
On Feb 14, 2010, at 3:23 PM, Mark Davis ☕ wrote:
> A few comments on remarks here:
>
> >Well as we know the IDNA protocol didn't adapt bidi algorithm (UAX
> #9) fully. They disallowed all bidi markers (LRM,RLM,...) which are
> they used to solve problems from this kind.
>
> > Well I don't think so it can be done in UAX#9 (well if URI has its
> own rules) the UAX#9 does know about the nature of characters
> (Neutral,RTL,LTR,week..) the context direction etc.. and thus there
> are possible ways to fix this issues in UAX#9 rather than IDNA itself.
>
> Changing UAX#9 (aka UBA) at this point would be very difficult,
> because of stability concerns. We've seen before where very minor
> changes to it have caused many problems for users, because it
> changes the layout of existing documents. While not impossible, one
> would have to make a very good case for the change, and be prepared
> to demonstrate, with compelling data, that the benefit would be
> worth the cost.
>
> The UBA was designed for plain text, not special syntax. And no
> matter how it was structured, it was always clear that one would
> need to be able to override the default; to that end, the marks and
> overrides were added. Because those are disallowed in IDNA, this
> tool is not available, however. The reason to not allow those in
> IDNA was because of the opportunity for constructing, artificially,
> very confusable IRIs.
>
> (BTW Looking back at it, one of the problems with the UBA was that
> it tried to do too much. There is a tension between heuristics and
> predictability, and if we could go back in time and redo it, one of
> the things I'd change would be to reduce the heuristics, especially
> around numbers, so as to make it more predictable for users.)
>
> However, it is possible and conformant to UBA to have a higher level
> protocol that reorders labels in a domain name, and in the path, and
> in the query, because it allows for such specialized overrides
> specifically. So you could take the following internal string with
> characters from left to right
>
> http://a.B.C.d/e/F/G/h?i=J&K=l&M=n&o=P
>
> and have them display
>
> ...F/e/d.C.B.a//:http
>
> This would be possible, but is not necessarily a good idea. The
> problem comes in the interaction between those environments that (a)
> look for IRIs and handle them this way, and (b) environments that
> don't parse for IRIs, or don't recognize them or their fragments, or
> don't display them in the 'new' way once they have them. There is
> already the issue of display being different in RTL vs LTR
> paragraphs; you don't want typing in one environment within RTL to
> give yet different results than in another within RTL.
>
> And we know that recognizing IRIs (and fragments thereof) occurring
> in plain text is difficult. You don't want PAYPAL.JOE.com to appear
> as PAYPAL.JOE.com in my email, and JOE.PAYPAL.com in the address
> bar, and so on.
>
> So any design for having a special ordering for IRI BIDI elements
> has to take a host of issues into account. I'm not saying that it
> can't be done, but it is a big job, and any transition has be be
> extremely carefully considered. Various people in Unicode have
> considered it at one time or another, but we've just never seen a
> clear path forward.
>
> Mark
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20100214/b1379ee2/attachment-0001.htm
More information about the Idna-update
mailing list