Mixing of AN and EN (Re: Protocol-08 (and status of Defs-04 andRationale-06))

Erik van der Poel erikv at google.com
Tue Dec 16 04:42:30 CET 2008


Hmmm, sorry about the multiple emails, but I guess it's not so simple,
because the IDNA2008 rules are based on the dot (.) being the
separator, but in IRIs, we have a couple of different types of
delimiters (ET and ON), so we'd have to test whether any characters
hop over those as a result of the bidi algorithm. This is somewhat
off-topic, so I'll stop here. Sorry.

Erik

On Mon, Dec 15, 2008 at 7:27 PM, Erik van der Poel <erikv at google.com> wrote:
> Oh, and % of course. This is also ET and therefore problematic.
>
> Erik
>
> On Mon, Dec 15, 2008 at 7:24 PM, Erik van der Poel <erikv at google.com> wrote:
>> Hi Martin,
>>
>> Back when I was looking into this for IDNA2008, I looked into IRIs a
>> bit too. The only standard delimiter that would pose a problem is #
>> (U+0023), which has bidi property ET, which is disallowed in IDNA2008
>> rules. The delimiters I looked into were :/@.;?=&# as in:
>>
>> scheme://user:password@host.com:port/path.txt;params=abc?query=foo&bar=blah#fragment
>>
>> Do you agree that # is the only problematic one? Or did you have other
>> reasons to believe that LTR is a MUST?
>>
>> Of course, if a server uses other delimiters in its URIs, all bets are
>> off. E.g. $
>>
>> Erik
>>
>> On Mon, Dec 15, 2008 at 6:18 PM, Martin Duerst <duerst at it.aoyama.ac.jp> wrote:
>>> At 04:45 08/12/16, Harald Alvestrand wrote:
>>>>Alireza Saleh wrote:
>>>>> Hi Erik,
>>>>>
>>>>>
>>>>> The latest news we received in this case is that Mark is going to look
>>>>> at it and will write a proposal and send it for public view. There is no
>>>>> exact time for that yet.
>>>>> Correction of this bug may not affect the -bidi rules directly, but it
>>>>> is related to the display of  L AN characters in a paragraph. However
>>>>> this
>>>>> effort may continue by UTC and application-providers to improve the
>>>>> display of AN and AL characters. When I look at the recent
>>>>> improvements of
>>>>> displaying AN,AL,R characters, I believe there will be no visual
>>>>> confusion
>>>>> when you have AN,AL,R characters in a LTR contexts such as domains.
>>>>What is your reason to believe that domains are an LTR context?
>>>>
>>>>The idea that domain names may occur in free text has been a basic
>>>>assumption behind the bidi work. If they didn't, the document would be a
>>>>lot shorter.
>>>
>>> There is no assumption of LTR context for domain names. However,
>>> the IRI spec REQUIRES the equivalent of LTR context for IRIs.
>>> The MUST is probably too strong, because it's very difficult to
>>> guarantee in practice, but if you don't have that, there's no
>>> guarantee that an IRI containing components with LTR characters
>>> and components with RTL characters displays consistently.
>>>
>>> Regards,    Martin.
>>>
>>>
>>>
>>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp
>>>
>>>
>>
>


More information about the Idna-update mailing list