IDNAbis spec

Thu Nov 5 12:26:17 CET 2009

Hello Shawn, others,

[cc-ing public-iri at w3.org, because this is essentially an IRI issue, not 
(only) an IDN issue]

On 2009/11/05 2:57, Shawn Steele wrote:
> 2) http://microsoft.com gets displayed http://microsoft.com because in this case the direction between two runs didn't change so :// will take the run direction which is LTR.
>
> I understand why :)  I'm not sure it's right.  Certainly http://L1.R2 doesn't render right in IE (R2.http://L1 makes no sense).  I could accept that LTR only should maybe do that, but once you have any RTL I think a different rule is needed.
>
> I think that labels are like a list.  If I have a list (a, b, c, d), then I expect the list to be in order a, b, c, d.  In an RTL context I would reasonably expect the list to be rendered (d, c, b, a).  If the individual values happen to be in different scripts, that's not going to change the fact that I expect the list to have each element progress in an orderly fashion from least significant to most significant.
>
> So http://R1.L2.L3.R4, I think that the expectation of R4.L3.L2.R1//:http makes sense.  The list progresses from 1 through 4.

R4.L3.L2.R1//:http makes a lot of sense in particular to people like us 
who know exactly what the components are, what the syntactically 
significant boundaries are, and so on. They may or may not make sense to 
everyday bidi users, the same way http://microsoft.com didn't make any 
sense to an average computer user around 1993.

> Unfortunately the character properties and rendering engines don't help much with that.

Indeed they don't help at all. I think there are essentially two ways to 
deal with this:

a) Try to get smart: Invent tweaks to the Unicode Bidi Algorithm, 
heuristics for detecting IRIs in context, special treatment for IRIs 
e.g. in browser address fields, and so on. This way, we may be able to 
improve some specific cases, but that could easily come at the expense 
of some other cases, and the solutions may not be applied everywhere in 
the same way, with risks to produce quite a lot of confusion (the same 
domain looking different in different contexts, and different domains 
looking the same).

b) Try to use a simple and clear way to display IRIs within the context 
of the Unicode Bidi Algorithm, e.g. as currently specified in RFC 3987 
(or a suitable variant thereof if we can agree on it quickly; I think in 
particular for absolute IRIs, there isn't necessarily a need for 
requiring an LTR embedding direction). Help people understand how to 
read these things (groups of consecutive RTL components are read RTL, 
groups of consecutive LTR components are read LTR, which is *the same 
way* this is done in plain text with groups of words unless there's an 
embedding structure). This will reduce the potential for confusion. 
Average computer users may not have to learn that much (just read these 
things like you read sentences with words from different 
directionalities). Specialists such as us may have to work a bit harder, 
but it may be worth it. Once people get used to it, they will have 
gotten used to it, the same way they got used to http:// and similar 
cryptic stuff in the first place. And in most cases, domain names should 
be RTL.RTL.RTL or LTR.LTR.LTR anyway, and I don't mind if we put a bit 
more pressure on that.

In summary, overall, less may be more, even if it may be difficult to 
admit for experts like us.

Regards,   Martin.

> -Shawn
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp