bidiclasses and delimiters
Soobok Lee
lsb at lsb.org
Thu Dec 7 07:38:29 CET 2006
http://www.unicode.org/Public/UNIDATA/extracted/DerivedBidiClass.txt
<quote>
# Bidi_Class=Other_Neutral
0021..0022 ; ON # Po [2] EXCLAMATION MARK..QUOTATION MARK
0026..0027 ; ON # Po [2] AMPERSAND..APOSTROPHE
0028 ; ON # Ps LEFT PARENTHESIS
0029 ; ON # Pe RIGHT PARENTHESIS
002A ; ON # Po ASTERISK
003B ; ON # Po SEMICOLON
003C..003E ; ON # Sm [3] LESS-THAN SIGN..GREATER-THAN SIGN
003F..0040 ; ON # Po [2] QUESTION MARK..COMMERCIAL AT
005B ; ON # Ps LEFT SQUARE BRACKET
005C ; ON # Po REVERSE SOLIDUS
005D ; ON # Pe RIGHT SQUARE BRACKET
005E ; ON # Sk CIRCUMFLEX ACCENT
005F ; ON # Pc LOW LINE
0060 ; ON # Sk GRAVE ACCENT
007B ; ON # Ps LEFT CURLY BRACKET
007C ; ON # Sm VERTICAL LINE
007D ; ON # Pe RIGHT CURLY BRACKET
007E ; ON # Sm TILDE
# ================================================
# Bidi_Class=Common_Separator
002C ; CS # Po COMMA
002E..002F ; CS # Po [2] FULL STOP..SOLIDUS
003A ; CS # Po COLON
</quote>
Here I freely list some cases that can catch our
attention about bidi. This might have been studied in IRI effort already.
I want just to find if something new to work in IDNA/IMA context
exist around this subject.
Common separators(CS) are not strong LtoR and All [Other_Neutral(ON)] are neutral.
Then, Comma,Colon(CS),semicolon(ON) is commonly used
as separator for multiple inputs of email addresses for MUA.
bidi Non-ascii at IDN.IDN or IDN.IDN would make problem
with these delimiters.
For example, Colon(CS),Slash(SOLIDUS/) is used at: (below IDN is all assumed as RTL):
http://www.IDN1.IDN2:80/ (displayed as http://www./08:1NDI.1NDI ??)
http://www.IDN1.IDN2/ (displayed as http://www./2NDI.1NDI ??)
http://RTL:password@www.msn.com (displayed as http://:LTRpassword@www.msn.com ??)
This example is not for <a href=""> which has recommended %-escaped IRI format,
but for the native form on the address bar of browsers.
My point: When we present IMA or IDN or IRI to end users in native forms,
we should (LtoR)ize all (ON and CS etc) classes of separator characters
by inserting,for example, LRE ? PDF sequences, if and only if we need some enforcement
of single strict display order of IDN labels and NON_ASCII localparts.
Soobok
More information about the Idna-update
mailing list