Remider: BIDI inter-label tests in -02

Erik van der Poel erikv at google.com
Tue Sep 9 16:47:56 CEST 2008


On Tue, Sep 9, 2008 at 12:07 AM, Slim Amamou <slim at alixsys.com> wrote:
> I'll take this opportunity to also express my point of view. Maybe it
> could shed some new light on the problem.
>
> IMHO, this a more general problem with separators (the "." in the IDN)
> and RTL scripts. A similar problem has been discussed on the IDN wiki
> some time ago (http://idn.icann.org/Talk:IDNwiki#RTL_scripts_URL_directionality_problem_.28arabic_for_instance.29)
>
> Separators are tools to represent order (sometimes hierarchy) in
> conjunction with the contextual directionality of text. I said
> contextual because in a RTL text you can have a LTR subtext and vice
> versa. Now, what's the problem with the 3.<ALEF>.com representation as
> opposed to <ALEF>.3.com in the Alireza's example? the problem is that
> <ALEF> is in the middle which gives a wrong representation of the
> expressed order of labels. Notice that com.3.<ALEF> is an equally
> valid representation as long as it is in the right context (like the
> RTL localized browser cited by Alireza for example).
>
> Given this, I think like Stephane Bortzmeyer, on the wire there is no
> ambiguity, so this issue should be addressed in another document.

Another IETF document? Published at the same time as IDNA200X? If we
don't publish something at the same time, we may see more
inconsistencies between the rules adopted by the TLD registries (e.g.
Israel, Egypt, etc).

> Especially since this issue is not specifically tied to IDNs and can
> happen in URLs and any displayable text that uses separators. I'm not
> aware of any document that addresses this specific issue.

The rules specified in the current IDNA200X bidi draft can be used
almost as effectively in URLs. The only separator that may cause
problems is # (for "fragments" at the end of the URL). This character
had bidi category ET, which is problematic (see the draft):

http://www.ietf.org/internet-drafts/draft-ietf-idnabis-bidi-02.txt

Erik

> On Tue, Sep 9, 2008 at 5:04 AM, Erik van der Poel <erikv at google.com> wrote:
>>
>> Forgive for me not preparing detailed PowerPoint slides, but the basic
>> idea of the bidi override is that they force the direction to be RTL
>> (RLO = right to left override) or LTR (LRO = left to right override).
>> Their effect ends when you hit a PDF (pop directional format).
>>
>> Obviously, you can still have ambiguity if you use these carelessly.
>> The following two are displayed the same way:
>>
>> <LRO> a b c <PDF>
>> <RLO> c b a <PDF>
>>
>> These are both displayed as "abc". We could remove that ambiguity by
>> specifying that LRO is to be used when the first character in a bidi
>> string is LTR, RLO when the 1st character is RTL.
>>
>> However, if we put LRO or RLO at the beginning of every bidi label and
>> PDF at the end of every bidi label, we might still have re-ordering
>> among labels rather than characters. (I'm not sure about the bidi
>> algorithm here.)
>>
>> One way to overcome this problem is to have LRO or RLO at the
>> beginning of the FQDN, and PDF at the end, but this destroys the
>> property that each label fully describes itself, and besides, we
>> probably don't want to deal with PDF at the end of a TLD.
>>
>> So perhaps we would just specify that only LRO is to be used (to
>> harmonize with the current LTR DNS), and that it must be at the
>> beginning of a bidi label (containing at least one RTL character), and
>> that there must be a PDF at the end of that label.
>>
>> One big problem with LRO and PDF is that they are prohibited in
>> IDNA2003. However, we have other incompatibilities with IDNA2003 (such
>> as ZWJ and ZWNJ), so maybe we can use similar strategies to make the
>> transition.
>>
>> I'm probably missing several things, since it is getting late here too. :-)
>>
>> Erik
>>
>> On Mon, Sep 8, 2008 at 6:39 PM, JFC Morfin <jefsey at jefsey.com> wrote:
>> > Erik, Andrew,
>> > I am not sure everyone is with you. At this stage and time in the
>> > night I am not anymore. Would it not help everyone is using ppt
>> > slides (so everything is clearly displayed) to give a clear example,
>> > step by step, analysing where the problem occurs, how would work the
>> > over-rides ?
>> > jfc
>> >
>> >
>> > At 02:57 09/09/2008, Erik van der Poel wrote:
>> >>Well, I believe we're stuck between a rock and a hard place. On one
>> >>side, we have DNAME, which, if used carelessly, can result in FQDNs
>> >>that are displayed ambiguously by the Unicode bidi algorithm. On the
>> >>other side, we have RTL characters that we would like to use in domain
>> >>names, in such a way that their display is unambiguous even in running
>> >>text. It's pretty clear that we cannot stop people from using DNAMEs.
>> >>But it's also quite clear that we must allow RTL characters in domain
>> >>names if we're going to allow other non-ASCII characters too. Finally,
>> >>it's clear that bidi strings are most often displayed using the
>> >>Unicode bidi algorithm.
>> >>
>> >>We cannot change that algorithm, but we might be able to work around
>> >>it using bidi overrides (LRO and RLO), which get rid of the ambiguity.
>> >>I don't know whether the WG members like that idea though. We might
>> >>want to list the pros and cons of such a proposal.
>> >>
>> >>Erik
>> >>_______________________________________________
>> >>Idna-update mailing list
>> >>Idna-update at alvestrand.no
>> >>http://www.alvestrand.no/mailman/listinfo/idna-update
>> >
>> > _______________________________________________
>> > Idna-update mailing list
>> > Idna-update at alvestrand.no
>> > http://www.alvestrand.no/mailman/listinfo/idna-update
>> >
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
> --
> Slim Amamou
> http://alixsys.com
>


More information about the Idna-update mailing list