Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 andRationale-06))

Tue Dec 16 05:08:48 CET 2008

Hello Erik,

It's a much more fundamental problem. Assume that there are two
domain names (a) com.ORG and (b) ORG.com (logical order, upper-case
represents RTL characters). Now these will appear in LTR text as:

(a) llllll com.GRO lllll
(b) llllll GRO.com lllll

but in RTL text, they will appear as:

(a) RRRRRR GRO.com RRRRR
(b) RRRRRR com.GRO RRRRR

You'll note that (a) and (b) are interchanged. There's no chance
that somebody puts one of these on a napkin and knows for sure
which one it was. With the LTR context restriction (using LRE-PDF
or <span dir='ltr'>-</span> or some such), the last two
lines will change to:

(a) RRRRRR com.GRO RRRRR
(b) RRRRRR GRO.com RRRRR

which looks much better.

Note that with leading URI schemes, distinction may be easier
(the URI scheme, at least for now, is always in ASCII, and
so the directionality can be identified by checking which side
of the IRI the scheme ends up.

Regards,    Martin.

At 12:27 08/12/16, Erik van der Poel wrote:
>Oh, and % of course. This is also ET and therefore problematic.
>
>Erik
>
>On Mon, Dec 15, 2008 at 7:24 PM, Erik van der Poel <erikv at google.com> wrote:
>> Hi Martin,
>>
>> Back when I was looking into this for IDNA2008, I looked into IRIs a
>> bit too. The only standard delimiter that would pose a problem is #
>> (U+0023), which has bidi property ET, which is disallowed in IDNA2008
>> rules. The delimiters I looked into were :/@.;?=&# as in:
>>
>> 
>scheme://user:password@host.com:port/path.txt;params=abc?query=foo&bar=blah#fragment
>>
>> Do you agree that # is the only problematic one? Or did you have other
>> reasons to believe that LTR is a MUST?
>>
>> Of course, if a server uses other delimiters in its URIs, all bets are
>> off. E.g. $
>>
>> Erik
>>
>> On Mon, Dec 15, 2008 at 6:18 PM, Martin Duerst <duerst at it.aoyama.ac.jp> wrote:
>>> At 04:45 08/12/16, Harald Alvestrand wrote:
>>>>Alireza Saleh wrote:
>>>>> Hi Erik,
>>>>>
>>>>>
>>>>> The latest news we received in this case is that Mark is going to look
>>>>> at it and will write a proposal and send it for public view. There is no
>>>>> exact time for that yet.
>>>>> Correction of this bug may not affect the -bidi rules directly, but it
>>>>> is related to the display of  L AN characters in a paragraph. However
>>>>> this
>>>>> effort may continue by UTC and application-providers to improve the
>>>>> display of AN and AL characters. When I look at the recent
>>>>> improvements of
>>>>> displaying AN,AL,R characters, I believe there will be no visual
>>>>> confusion
>>>>> when you have AN,AL,R characters in a LTR contexts such as domains.
>>>>What is your reason to believe that domains are an LTR context?
>>>>
>>>>The idea that domain names may occur in free text has been a basic
>>>>assumption behind the bidi work. If they didn't, the document would be a
>>>>lot shorter.
>>>
>>> There is no assumption of LTR context for domain names. However,
>>> the IRI spec REQUIRES the equivalent of LTR context for IRIs.
>>> The MUST is probably too strong, because it's very difficult to
>>> guarantee in practice, but if you don't have that, there's no
>>> guarantee that an IRI containing components with LTR characters
>>> and components with RTL characters displays consistently.
>>>
>>> Regards,    Martin.
>>>
>>>
>>>
>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>>
>>>
>>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp