Remider: BIDI inter-label tests in -02

Erik van der Poel erikv at google.com
Wed Sep 10 17:11:13 CEST 2008


Hi Eric,

I only mentioned the LRO/PDF idea in an effort to consider other
alternatives when we were discussing "the DNAME problem". I don't know
whether we consider the DNAME problem to be big enough to cause us to
resort to a "metadata" approach like LRO/PDF. Perhaps many WG
participants believe that the DNAME problem is not a big deal, and
that we should just forge ahead with the current IDNA200X bidi draft.

What other types of metadata do you have in mind, and what big
problems are solved by those?

Erik

On Wed, Sep 10, 2008 at 7:55 AM, Eric Brunner-Williams
<ebw at abenaki.wabanaki.net> wrote:
> Erik,
>
> The same thought occured to me Monday afternoon as I sat in on Richard
> Ishida's bidi tutorial at ICU-32 and realized for the first time that
> the directionality leak we see is (a) common to a bunch of punctuation
> marks, and (b) been well known to the Unicode community for a dog's age,
> unlike the rest of us who "got it" more or less at Dublin. I've no
> excuse for being retarded, as 15 years ago I was on the bidi mailing list.
>
> But if we toss in meta-data, why stop there?
>
> Eric
>
> Erik van der Poel wrote:
>> Forgive for me not preparing detailed PowerPoint slides, but the basic
>> idea of the bidi override is that they force the direction to be RTL
>> (RLO = right to left override) or LTR (LRO = left to right override).
>> Their effect ends when you hit a PDF (pop directional format).
>>
>> Obviously, you can still have ambiguity if you use these carelessly.
>> The following two are displayed the same way:
>>
>> <LRO> a b c <PDF>
>> <RLO> c b a <PDF>
>>
>> These are both displayed as "abc". We could remove that ambiguity by
>> specifying that LRO is to be used when the first character in a bidi
>> string is LTR, RLO when the 1st character is RTL.
>>
>> However, if we put LRO or RLO at the beginning of every bidi label and
>> PDF at the end of every bidi label, we might still have re-ordering
>> among labels rather than characters. (I'm not sure about the bidi
>> algorithm here.)
>>
>> One way to overcome this problem is to have LRO or RLO at the
>> beginning of the FQDN, and PDF at the end, but this destroys the
>> property that each label fully describes itself, and besides, we
>> probably don't want to deal with PDF at the end of a TLD.
>>
>> So perhaps we would just specify that only LRO is to be used (to
>> harmonize with the current LTR DNS), and that it must be at the
>> beginning of a bidi label (containing at least one RTL character), and
>> that there must be a PDF at the end of that label.
>>
>> One big problem with LRO and PDF is that they are prohibited in
>> IDNA2003. However, we have other incompatibilities with IDNA2003 (such
>> as ZWJ and ZWNJ), so maybe we can use similar strategies to make the
>> transition.
>>
>> I'm probably missing several things, since it is getting late here too. :-)
>>
>> Erik
>>
>> On Mon, Sep 8, 2008 at 6:39 PM, JFC Morfin <jefsey at jefsey.com> wrote:
>>
>>> Erik, Andrew,
>>> I am not sure everyone is with you. At this stage and time in the
>>> night I am not anymore. Would it not help everyone is using ppt
>>> slides (so everything is clearly displayed) to give a clear example,
>>> step by step, analysing where the problem occurs, how would work the
>>> over-rides ?
>>> jfc
>>>
>>>
>>> At 02:57 09/09/2008, Erik van der Poel wrote:
>>>
>>>> Well, I believe we're stuck between a rock and a hard place. On one
>>>> side, we have DNAME, which, if used carelessly, can result in FQDNs
>>>> that are displayed ambiguously by the Unicode bidi algorithm. On the
>>>> other side, we have RTL characters that we would like to use in domain
>>>> names, in such a way that their display is unambiguous even in running
>>>> text. It's pretty clear that we cannot stop people from using DNAMEs.
>>>> But it's also quite clear that we must allow RTL characters in domain
>>>> names if we're going to allow other non-ASCII characters too. Finally,
>>>> it's clear that bidi strings are most often displayed using the
>>>> Unicode bidi algorithm.
>>>>
>>>> We cannot change that algorithm, but we might be able to work around
>>>> it using bidi overrides (LRO and RLO), which get rid of the ambiguity.
>>>> I don't know whether the WG members like that idea though. We might
>>>> want to list the pros and cons of such a proposal.
>>>>
>>>> Erik
>>>> _______________________________________________
>>>> Idna-update mailing list
>>>> Idna-update at alvestrand.no
>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>
>>>
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>>
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>


More information about the Idna-update mailing list