IDNAbis spec

Abdulrahman I. ALGhadir aghadir at citc.gov.sa
Sat Nov 7 06:29:05 CET 2009


Hey everyone,

"a) Try to get smart: Invent tweaks to the Unicode Bidi Algorithm, 
heuristics for detecting IRIs in context, special treatment for IRIs 
e.g. in browser address fields, and so on. This way, we may be able to 
improve some specific cases, but that could easily come at the expense 
of some other cases, and the solutions may not be applied everywhere in 
the same way, with risks to produce quite a lot of confusion (the same 
domain looking different in different contexts, and different domains 
looking the same)."

Well I agree with Martin on this , IRIs should be identified in UAX#9 bidi algorithm a set of rules which they solve most of directionally problems. If this is possible to happen other problems which are related to display will be possible to be solved.

Like in starting with digits in RTL domain names:
Network order:
<RTL_label1><number1>.<number2><RTL_label2>
This will result in RTL context:
<RTL_label2><number1>.<number2><RTL_label1>
If somehow bidi algorithm has something which identify this case and treat this case as:
<RTL_label2><number2>.<number1><RTL_label1>

This will boost current IDNA.

AbdulRahman,

-----Original Message-----
From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of "Martin J. Dürst"
Sent: 5/Nov/2009 2:26 PM
To: Shawn Steele
Cc: Alireza Saleh; muhtaseb at kfupm.edu.sa; public-iri at w3.org; idna-update at alvestrand.no; Lisa Dusseault; Abdulrahman I. ALGhadir
Subject: Re: IDNAbis spec

Hello Shawn, others,

[cc-ing public-iri at w3.org, because this is essentially an IRI issue, not 
(only) an IDN issue]

On 2009/11/05 2:57, Shawn Steele wrote:
> 2) http://microsoft.com gets displayed http://microsoft.com because in this case the direction between two runs didn't change so :// will take the run direction which is LTR.
>
> I understand why :)  I'm not sure it's right.  Certainly http://L1.R2 doesn't render right in IE (R2.http://L1 makes no sense).  I could accept that LTR only should maybe do that, but once you have any RTL I think a different rule is needed.
>
> I think that labels are like a list.  If I have a list (a, b, c, d), then I expect the list to be in order a, b, c, d.  In an RTL context I would reasonably expect the list to be rendered (d, c, b, a).  If the individual values happen to be in different scripts, that's not going to change the fact that I expect the list to have each element progress in an orderly fashion from least significant to most significant.
>
> So http://R1.L2.L3.R4, I think that the expectation of R4.L3.L2.R1//:http makes sense.  The list progresses from 1 through 4.

R4.L3.L2.R1//:http makes a lot of sense in particular to people like us 
who know exactly what the components are, what the syntactically 
significant boundaries are, and so on. They may or may not make sense to 
everyday bidi users, the same way http://microsoft.com didn't make any 
sense to an average computer user around 1993.


> Unfortunately the character properties and rendering engines don't help much with that.

Indeed they don't help at all. I think there are essentially two ways to 
deal with this:

a) Try to get smart: Invent tweaks to the Unicode Bidi Algorithm, 
heuristics for detecting IRIs in context, special treatment for IRIs 
e.g. in browser address fields, and so on. This way, we may be able to 
improve some specific cases, but that could easily come at the expense 
of some other cases, and the solutions may not be applied everywhere in 
the same way, with risks to produce quite a lot of confusion (the same 
domain looking different in different contexts, and different domains 
looking the same).

b) Try to use a simple and clear way to display IRIs within the context 
of the Unicode Bidi Algorithm, e.g. as currently specified in RFC 3987 
(or a suitable variant thereof if we can agree on it quickly; I think in 
particular for absolute IRIs, there isn't necessarily a need for 
requiring an LTR embedding direction). Help people understand how to 
read these things (groups of consecutive RTL components are read RTL, 
groups of consecutive LTR components are read LTR, which is *the same 
way* this is done in plain text with groups of words unless there's an 
embedding structure). This will reduce the potential for confusion. 
Average computer users may not have to learn that much (just read these 
things like you read sentences with words from different 
directionalities). Specialists such as us may have to work a bit harder, 
but it may be worth it. Once people get used to it, they will have 
gotten used to it, the same way they got used to http:// and similar 
cryptic stuff in the first place. And in most cases, domain names should 
be RTL.RTL.RTL or LTR.LTR.LTR anyway, and I don't mind if we put a bit 
more pressure on that.

In summary, overall, less may be more, even if it may be difficult to 
admit for experts like us.

Regards,   Martin.

> -Shawn
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

-----------------------------------------------------------------------------------
Disclaimer:
This message and its attachment, if any, are confidential and may contain legally
privileged information. If you are not the intended recipient, please contact the
sender immediately and delete this message and its attachment, if any, from your
system. You should not copy this message or disclose its contents to any other
person or use it for any purpose. Statements and opinions expressed in this e-mail
are those of the sender, and do not necessarily reflect those of the Communications
and Information Technology Commission (CITC). CITC accepts no liability for damage
caused by this email.


More information about the Idna-update mailing list