Protocol Action: 'Right-to-left scripts for IDNA' to Proposed Standard

Sun Feb 14 21:23:05 CET 2010

A few comments on remarks here:

>Well as we know the IDNA protocol didn't adapt bidi algorithm (UAX #9)
fully. They disallowed all bidi markers (LRM,RLM,...) which are they used to
solve problems from this kind.

> Well I don't think so it can be done in UAX#9 (well if URI has its own
rules) the UAX#9 does know about the nature of characters
(Neutral,RTL,LTR,week..) the context direction etc.. and thus there are
possible ways to fix this issues in UAX#9 rather than IDNA itself.

Changing UAX#9 (aka UBA) at this point would be very difficult, because of
stability concerns. We've seen before where very minor changes to it have
caused many problems for users, because it changes the layout of existing
documents. While not impossible, one would have to make a very good case for
the change, and be prepared to demonstrate, with compelling data, that the
benefit would be worth the cost.

The UBA was designed for plain text, not special syntax. And no matter how
it was structured, it was always clear that one would need to be able to
override the default; to that end, the marks and overrides were added.
Because those are disallowed in IDNA, this tool is not available, however.
 The reason to not allow those in IDNA was because of the opportunity for
constructing, artificially, very confusable IRIs.

(BTW Looking back at it, one of the problems with the UBA was that it tried
to do too much. There is a tension between heuristics and predictability,
and if we could go back in time and redo it, one of the things I'd change
would be to reduce the heuristics, especially around numbers, so as to make
it more predictable for users.)

However, it is possible and conformant to UBA to have a higher level
protocol that reorders labels in a domain name, and in the path, and in the
query, because it allows for such specialized overrides specifically. So you
could take the following internal string with characters from left to right

http://a.B.C.d/e/F/G/h?i=J&K=l&M=n&o=P

and have them display

...F/e/d.C.B.a//:http

This would be possible, but is not necessarily a good idea. The problem
comes in the interaction between those environments that (a) look for IRIs
and handle them this way, and (b) environments that don't parse for IRIs, or
don't recognize them or their fragments, or don't display them in the 'new'
way once they have them. There is already the issue of display being
different in RTL vs LTR paragraphs; you don't want typing in one environment
within RTL to give yet different results than in another within RTL.

And we know that recognizing IRIs (and fragments thereof) occurring in plain
text is difficult. You don't want PAYPAL.JOE.com to appear as PAYPAL.JOE.com in
my email, and JOE.PAYPAL.com in the address bar, and so on.

So any design for having a special ordering for IRI BIDI elements has to
take a host of issues into account. I'm not saying that it can't be done,
but it is a big job, and any transition has be be extremely carefully
considered. Various people in Unicode have considered it at one time or
another, but we've just never seen a clear path forward.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20100214/7d66ce3c/attachment.htm