bidiclasses and delimiters

Soobok Lee lsb at lsb.org
Thu Dec 7 07:38:29 CET 2006


http://www.unicode.org/Public/UNIDATA/extracted/DerivedBidiClass.txt

<quote>
# Bidi_Class=Other_Neutral

0021..0022    ; ON # Po   [2] EXCLAMATION MARK..QUOTATION MARK
0026..0027    ; ON # Po   [2] AMPERSAND..APOSTROPHE
0028          ; ON # Ps       LEFT PARENTHESIS
0029          ; ON # Pe       RIGHT PARENTHESIS
002A          ; ON # Po       ASTERISK
003B          ; ON # Po       SEMICOLON
003C..003E    ; ON # Sm   [3] LESS-THAN SIGN..GREATER-THAN SIGN
003F..0040    ; ON # Po   [2] QUESTION MARK..COMMERCIAL AT
005B          ; ON # Ps       LEFT SQUARE BRACKET
005C          ; ON # Po       REVERSE SOLIDUS
005D          ; ON # Pe       RIGHT SQUARE BRACKET
005E          ; ON # Sk       CIRCUMFLEX ACCENT
005F          ; ON # Pc       LOW LINE
0060          ; ON # Sk       GRAVE ACCENT
007B          ; ON # Ps       LEFT CURLY BRACKET
007C          ; ON # Sm       VERTICAL LINE
007D          ; ON # Pe       RIGHT CURLY BRACKET
007E          ; ON # Sm       TILDE

# ================================================

# Bidi_Class=Common_Separator  

002C          ; CS # Po       COMMA
002E..002F    ; CS # Po   [2] FULL STOP..SOLIDUS
003A          ; CS # Po       COLON

</quote>

Here I freely list some cases that can catch our
attention about bidi. This might have been studied in IRI effort already.
I want just to find if something new to work in IDNA/IMA context
exist around this subject.

Common separators(CS) are not strong LtoR and All [Other_Neutral(ON)] are  neutral. 

Then, Comma,Colon(CS),semicolon(ON) is commonly used
as  separator for multiple inputs of email addresses for MUA.
bidi Non-ascii at IDN.IDN  or IDN.IDN would make problem
with these delimiters.

For example, Colon(CS),Slash(SOLIDUS/) is used at: (below IDN is all assumed as RTL):
 http://www.IDN1.IDN2:80/      (displayed as http://www./08:1NDI.1NDI ??)
 http://www.IDN1.IDN2/      (displayed as http://www./2NDI.1NDI ??)
 http://RTL:password@www.msn.com (displayed as http://:LTRpassword@www.msn.com ??)

This example is not for <a href=""> which has recommended %-escaped IRI format, 
but for the native form on the address bar of browsers.

My point: When we present IMA or IDN or IRI to end users in native forms,
we should (LtoR)ize all (ON and CS etc) classes of separator characters 
by inserting,for example, LRE ? PDF sequences, if and only if we need some enforcement
of single strict display order of IDN labels and NON_ASCII localparts.

Soobok


More information about the Idna-update mailing list