Remider: BIDI inter-label tests in -02

Slim Amamou slim at alixsys.com
Tue Sep 9 09:07:51 CEST 2008


I'll take this opportunity to also express my point of view. Maybe it
could shed some new light on the problem.

IMHO, this a more general problem with separators (the "." in the IDN)
and RTL scripts. A similar problem has been discussed on the IDN wiki
some time ago (http://idn.icann.org/Talk:IDNwiki#RTL_scripts_URL_directionality_problem_.28arabic_for_instance.29)

Separators are tools to represent order (sometimes hierarchy) in
conjunction with the contextual directionality of text. I said
contextual because in a RTL text you can have a LTR subtext and vice
versa. Now, what's the problem with the 3.<ALEF>.com representation as
opposed to <ALEF>.3.com in the Alireza's example? the problem is that
<ALEF> is in the middle which gives a wrong representation of the
expressed order of labels. Notice that com.3.<ALEF> is an equally
valid representation as long as it is in the right context (like the
RTL localized browser cited by Alireza for example).

Given this, I think like Stephane Bortzmeyer, on the wire there is no
ambiguity, so this issue should be addressed in another document.
Especially since this issue is not specifically tied to IDNs and can
happen in URLs and any displayable text that uses separators. I'm not
aware of any document that addresses this specific issue.

On Tue, Sep 9, 2008 at 5:04 AM, Erik van der Poel <erikv at google.com> wrote:
>
> Forgive for me not preparing detailed PowerPoint slides, but the basic
> idea of the bidi override is that they force the direction to be RTL
> (RLO = right to left override) or LTR (LRO = left to right override).
> Their effect ends when you hit a PDF (pop directional format).
>
> Obviously, you can still have ambiguity if you use these carelessly.
> The following two are displayed the same way:
>
> <LRO> a b c <PDF>
> <RLO> c b a <PDF>
>
> These are both displayed as "abc". We could remove that ambiguity by
> specifying that LRO is to be used when the first character in a bidi
> string is LTR, RLO when the 1st character is RTL.
>
> However, if we put LRO or RLO at the beginning of every bidi label and
> PDF at the end of every bidi label, we might still have re-ordering
> among labels rather than characters. (I'm not sure about the bidi
> algorithm here.)
>
> One way to overcome this problem is to have LRO or RLO at the
> beginning of the FQDN, and PDF at the end, but this destroys the
> property that each label fully describes itself, and besides, we
> probably don't want to deal with PDF at the end of a TLD.
>
> So perhaps we would just specify that only LRO is to be used (to
> harmonize with the current LTR DNS), and that it must be at the
> beginning of a bidi label (containing at least one RTL character), and
> that there must be a PDF at the end of that label.
>
> One big problem with LRO and PDF is that they are prohibited in
> IDNA2003. However, we have other incompatibilities with IDNA2003 (such
> as ZWJ and ZWNJ), so maybe we can use similar strategies to make the
> transition.
>
> I'm probably missing several things, since it is getting late here too. :-)
>
> Erik
>
> On Mon, Sep 8, 2008 at 6:39 PM, JFC Morfin <jefsey at jefsey.com> wrote:
> > Erik, Andrew,
> > I am not sure everyone is with you. At this stage and time in the
> > night I am not anymore. Would it not help everyone is using ppt
> > slides (so everything is clearly displayed) to give a clear example,
> > step by step, analysing where the problem occurs, how would work the
> > over-rides ?
> > jfc
> >
> >
> > At 02:57 09/09/2008, Erik van der Poel wrote:
> >>Well, I believe we're stuck between a rock and a hard place. On one
> >>side, we have DNAME, which, if used carelessly, can result in FQDNs
> >>that are displayed ambiguously by the Unicode bidi algorithm. On the
> >>other side, we have RTL characters that we would like to use in domain
> >>names, in such a way that their display is unambiguous even in running
> >>text. It's pretty clear that we cannot stop people from using DNAMEs.
> >>But it's also quite clear that we must allow RTL characters in domain
> >>names if we're going to allow other non-ASCII characters too. Finally,
> >>it's clear that bidi strings are most often displayed using the
> >>Unicode bidi algorithm.
> >>
> >>We cannot change that algorithm, but we might be able to work around
> >>it using bidi overrides (LRO and RLO), which get rid of the ambiguity.
> >>I don't know whether the WG members like that idea though. We might
> >>want to list the pros and cons of such a proposal.
> >>
> >>Erik
> >>_______________________________________________
> >>Idna-update mailing list
> >>Idna-update at alvestrand.no
> >>http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



--
Slim Amamou
http://alixsys.com


More information about the Idna-update mailing list