input time order of IDN/IMA

Martin Duerst duerst at it.aoyama.ac.jp
Fri Dec 29 05:07:33 CET 2006


At 15:06 06/12/08, Soobok Lee wrote:
>On Fri, Dec 08, 2006 at 02:51:44PM +0900, Soobok Lee wrote:
>> On Fri, Dec 08, 2006 at 06:35:58AM +0100, Harald Alvestrand wrote:
>> > 
>> > 
>> > --On 8. desember 2006 12:35 +0900 Soobok Lee <lsb at lsb.org> wrote:
>> > 
>> > >Suggested recommendation (primarily for IMA draft, but also IDNAbis):
>> > >  if bidi IMA/IDN contains at least one strong LtoR chars (like .com)
>> > >      labels SHOULD flow left to right.
>> > >  if bidi IMA/IDN contains no  strong LtoR chars at all,
>> > >      labels SHOULD flow right to left.
>> > >  dot-delimited bidi localparts SHOULD share the same display order
>> > >      consistenly with labels of bidi IDN parts.
>> > >
>> > 
>> > note the other requirement:
>> > 
>> > display of a domain name (or an email address) copied into running text and 
>> > treated using normal Unicode BIDI algorithms SHOULD be exactly the same as 
>> > the domain name displayed by software that "knows" it's a domain name.
>
>This requirement was already *loosened* by IDNA, since it opened rooms
>for punycode-form display of IDN, instead of native form, for some 
>problematic cases (not fonts, possible spooding etc).
>
>My above suggestion is for such special cases, not for general one, even though
>that attempt won't success in running text as you said.

Hello Soobok,

What Harald said is *extremely* important. There is a big difference
between switching from e.g. ABCD.EFGH to xn-XXXXXX.xn-YYYYY and
switching from ABCD.EFGH to EFGH.ABCD.

In the first case, it's very clear that one is the original, and
the other is punycode. Of course the average user won't like this,
because in general, nobody likes to see punycode.

In the second case, if one and the same thing can be displayed in
two different sequences, and if two different things can be displayed
the same way, there will be huge confusion.

For more background information on bidi in the context of IRIs
and similar cases, I suggest to read Section 4 of RFC 3987
very carefully, and to study the examples therein.


>> > The illusion that domain names can be treated specially for display under 
>> > all conditions is just that - an illusion.
>> 
>> Yes. it's impossible. It's clear. Just finding recommendations to *reduce*
>> end users' frustrations.

The reason why something like LTRdnoceS.LTRtsriF at LTRlacoL.com,
although in some ways quite suboptimal, was deemed acceptable
(you may call it the 'least bad' solution) is that people who
read bidi text are used to read alternating stretches of RTL
and LTR text. In the above example, they would therefore first
read "LTRdnoceS.LTRtsriF at LTRlacoL" (from right to left), and
then ".com" (from left to right), which will be just right.
This kind of reading habit in particular applies to people
who have no idea about the structure of Internet identifiers.
So people who know more about the Internet may have to adapt
a bit more for reading bidi identifiers, but that should be
okay. [Of course, the best solution is to introduce non-ASCII
TLDs to make things even easier.]

The above wasn't my idea, it came from Mati Allouche who has
decades of bidi experience at IBM Israel.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list