Remider: BIDI inter-label tests in -02

Erik van der Poel erikv at google.com
Tue Sep 9 05:04:43 CEST 2008


Forgive for me not preparing detailed PowerPoint slides, but the basic
idea of the bidi override is that they force the direction to be RTL
(RLO = right to left override) or LTR (LRO = left to right override).
Their effect ends when you hit a PDF (pop directional format).

Obviously, you can still have ambiguity if you use these carelessly.
The following two are displayed the same way:

<LRO> a b c <PDF>
<RLO> c b a <PDF>

These are both displayed as "abc". We could remove that ambiguity by
specifying that LRO is to be used when the first character in a bidi
string is LTR, RLO when the 1st character is RTL.

However, if we put LRO or RLO at the beginning of every bidi label and
PDF at the end of every bidi label, we might still have re-ordering
among labels rather than characters. (I'm not sure about the bidi
algorithm here.)

One way to overcome this problem is to have LRO or RLO at the
beginning of the FQDN, and PDF at the end, but this destroys the
property that each label fully describes itself, and besides, we
probably don't want to deal with PDF at the end of a TLD.

So perhaps we would just specify that only LRO is to be used (to
harmonize with the current LTR DNS), and that it must be at the
beginning of a bidi label (containing at least one RTL character), and
that there must be a PDF at the end of that label.

One big problem with LRO and PDF is that they are prohibited in
IDNA2003. However, we have other incompatibilities with IDNA2003 (such
as ZWJ and ZWNJ), so maybe we can use similar strategies to make the
transition.

I'm probably missing several things, since it is getting late here too. :-)

Erik

On Mon, Sep 8, 2008 at 6:39 PM, JFC Morfin <jefsey at jefsey.com> wrote:
> Erik, Andrew,
> I am not sure everyone is with you. At this stage and time in the
> night I am not anymore. Would it not help everyone is using ppt
> slides (so everything is clearly displayed) to give a clear example,
> step by step, analysing where the problem occurs, how would work the
> over-rides ?
> jfc
>
>
> At 02:57 09/09/2008, Erik van der Poel wrote:
>>Well, I believe we're stuck between a rock and a hard place. On one
>>side, we have DNAME, which, if used carelessly, can result in FQDNs
>>that are displayed ambiguously by the Unicode bidi algorithm. On the
>>other side, we have RTL characters that we would like to use in domain
>>names, in such a way that their display is unambiguous even in running
>>text. It's pretty clear that we cannot stop people from using DNAMEs.
>>But it's also quite clear that we must allow RTL characters in domain
>>names if we're going to allow other non-ASCII characters too. Finally,
>>it's clear that bidi strings are most often displayed using the
>>Unicode bidi algorithm.
>>
>>We cannot change that algorithm, but we might be able to work around
>>it using bidi overrides (LRO and RLO), which get rid of the ambiguity.
>>I don't know whether the WG members like that idea though. We might
>>want to list the pros and cons of such a proposal.
>>
>>Erik
>>_______________________________________________
>>Idna-update mailing list
>>Idna-update at alvestrand.no
>>http://www.alvestrand.no/mailman/listinfo/idna-update
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list