bidi control chars (RLE/LRE ... PDF) mapping out
Soobok Lee
lsb at lsb.org
Sat Dec 16 06:31:23 CET 2006
Bidi-related Suggestion:
leading <RLE> and <LRE> and trailing <PDF> in a unicode identifier string
can be safely mapped out to nothing without affecting the internal
display order of the identifier strong,
while they should be prohibited in the other position of an identifier string.
1) this can be done out of nameprep/stringprep200X .
2) this can be done in the scope of nameprep/stringprep200X .
I support 2). It is simple to implement. Complex RtL normalized is needed.
This can be categorize as "context-dependent mapping out list".
But, both 1) and 2) can be performed for one input by clients.
Reason:
While <RLE or LRE> ... <PDF> sequences is used to embed
R2L or L2R substring (often, it may be an unicode identifier like IMA/IDN)
in surrounding L2R or R2L string (often, plain running text), repectively,
they DON'T change the internal display order of the embedded substring,
but only affect the embedded block's external display order
within the surrounding text.
That is, <RLE>BLOCK1<PDF> and <LRE>BLOCK1<PDF> and BLOCK1
have the same visual order by themselves [UAX9].
<RLO> <LRO> should be prohibited surely, because they "override" and change
the internal display order of the embedded string.
<RLM> <RLM> should be prohibited even if it does not "override",
but I am not sure about this prohibition yet.
Background:
I met this problem with EAI MUA display conventions for RtoL phrase and address.
(RtoL chars in Uppercased)
eg1)
(network order)
"FULLNAME" <LOCAL at DOMAIN.TLD>
(display order : distorted)
"DLT.NIAMOD at LACOL> "EMANLLUF> <=== SEE THIS!
eg2) workaround with RLE ... PDF
(storage order after inserting RLE ... PDF)
"<RLE>FULLNAME<PDF>" <<RLE>LOCAL at DOMAIN.TLD<PDF>>
(storage order after inserting RLE ... PDF outside)
<RLE>"FULLNAME"<PDF> <RLE><LOCAL at DOMAIN.TLD><PDF>
(display order after)
<DLT.NIAMOD at LACOL> "EMANLLUF"
eg3) workaround with LRE ... PDF
(storage order after inserting LRE ... PDF)
"<LRE>FULLNAME<PDF>" <<LRE>LOCAL at DOMAIN.TLD<PDF>>
(storage order after inserting LRE ... PDF outside)
<LRE>"FULLNAME"<PDF> <LRE><LOCAL at DOMAIN.TLD><PDF>
(display order after)
"EMANLLUF" <DLT.NIAMOD at LACOL>
(RLE/LRE)...(PDF) have many practical use in IDN/IMA identifier
display in the future. When copy&pasted, those bidi control chars will
be included in the copy buffer and then fed into nameprep200X, at least
in Windows XP SP2.
You may test with this javascript samples if you have Hebrew chars fonts.
%u202a== LRE
%u202b== RLE
%u202c== PDF
<script>
document.write(unescape('eg1) "%u05d0%u05d1%u05d2" <%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db> '));
document.writeln("<br>");
document.write(unescape('eg2) %u202b"%u05d0%u05d1%u05d2"%u202c %u202b<%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db>%u202c '));
document.writeln("<br>");
document.write(unescape('eg3) %u202a"%u05d0%u05d1%u05d2"%u202c %u202a<%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db>%u202c '));
document.writeln("<br>");
</script>
Soobok
More information about the Idna-update
mailing list