"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Thu Nov 5 12:26:17 CET 2009
Hello Shawn, others,
[cc-ing public-iri at w3.org, because this is essentially an IRI issue, not
(only) an IDN issue]
On 2009/11/05 2:57, Shawn Steele wrote:
> 2) http://microsoft.com gets displayed http://microsoft.com because in this case the direction between two runs didn't change so :// will take the run direction which is LTR.
> I understand why :) I'm not sure it's right. Certainly http://L1.R2 doesn't render right in IE (R2.http://L1 makes no sense). I could accept that LTR only should maybe do that, but once you have any RTL I think a different rule is needed.
> I think that labels are like a list. If I have a list (a, b, c, d), then I expect the list to be in order a, b, c, d. In an RTL context I would reasonably expect the list to be rendered (d, c, b, a). If the individual values happen to be in different scripts, that's not going to change the fact that I expect the list to have each element progress in an orderly fashion from least significant to most significant.
> So http://R1.L2.L3.R4, I think that the expectation of R4.L3.L2.R1//:http makes sense. The list progresses from 1 through 4.
R4.L3.L2.R1//:http makes a lot of sense in particular to people like us
who know exactly what the components are, what the syntactically
significant boundaries are, and so on. They may or may not make sense to
everyday bidi users, the same way http://microsoft.com didn't make any
sense to an average computer user around 1993.
> Unfortunately the character properties and rendering engines don't help much with that.
Indeed they don't help at all. I think there are essentially two ways to
deal with this:
a) Try to get smart: Invent tweaks to the Unicode Bidi Algorithm,
heuristics for detecting IRIs in context, special treatment for IRIs
e.g. in browser address fields, and so on. This way, we may be able to
improve some specific cases, but that could easily come at the expense
of some other cases, and the solutions may not be applied everywhere in
the same way, with risks to produce quite a lot of confusion (the same
domain looking different in different contexts, and different domains
looking the same).
b) Try to use a simple and clear way to display IRIs within the context
of the Unicode Bidi Algorithm, e.g. as currently specified in RFC 3987
(or a suitable variant thereof if we can agree on it quickly; I think in
particular for absolute IRIs, there isn't necessarily a need for
requiring an LTR embedding direction). Help people understand how to
read these things (groups of consecutive RTL components are read RTL,
groups of consecutive LTR components are read LTR, which is *the same
way* this is done in plain text with groups of words unless there's an
embedding structure). This will reduce the potential for confusion.
Average computer users may not have to learn that much (just read these
things like you read sentences with words from different
directionalities). Specialists such as us may have to work a bit harder,
but it may be worth it. Once people get used to it, they will have
gotten used to it, the same way they got used to http:// and similar
cryptic stuff in the first place. And in most cases, domain names should
be RTL.RTL.RTL or LTR.LTR.LTR anyway, and I don't mind if we put a bit
more pressure on that.
In summary, overall, less may be more, even if it may be difficult to
admit for experts like us.
> Idna-update mailing list
> Idna-update at alvestrand.no
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update