bidi control chars (RLE/LRE ... PDF) mapping out

Soobok Lee lsb at lsb.org
Sat Dec 16 06:31:23 CET 2006



Bidi-related Suggestion:
  leading <RLE> and <LRE> and trailing <PDF> in a unicode identifier string
   can be safely mapped out to nothing without affecting the internal 
   display order of the identifier strong, 
   while they should be prohibited in the other position of an identifier string.

   1) this can be done out of nameprep/stringprep200X .
   2) this can be done in the scope of nameprep/stringprep200X .
     I support 2). It is simple to implement. Complex RtL normalized is needed.
     This can be categorize as "context-dependent mapping out list".
     But, both 1) and 2) can be performed for one input by clients. 

Reason:
  While <RLE or LRE> ... <PDF> sequences is used to embed 
   R2L or L2R substring (often, it may be an unicode identifier like IMA/IDN) 
   in  surrounding L2R or R2L string (often, plain running text), repectively, 
  they DON'T change the internal display order of the embedded substring, 
   but only affect the embedded block's external display order 
     within the surrounding text.

  That is, <RLE>BLOCK1<PDF> and <LRE>BLOCK1<PDF> and BLOCK1
        have the same visual order by themselves [UAX9].

  <RLO> <LRO> should be prohibited surely, because they "override" and change 
         the internal display order of the embedded string.
  <RLM> <RLM> should be prohibited even if it does not "override", 
         but I am not sure about this prohibition yet.

Background:
I met this problem with EAI MUA display conventions for RtoL phrase and address. 
(RtoL chars in Uppercased)

        eg1) 
        (network order)
           "FULLNAME" <LOCAL at DOMAIN.TLD>
        
        (display order : distorted)
           "DLT.NIAMOD at LACOL> "EMANLLUF>      <=== SEE THIS!

	eg2) workaround with RLE ... PDF
        (storage order after inserting RLE ... PDF)
           "<RLE>FULLNAME<PDF>" <<RLE>LOCAL at DOMAIN.TLD<PDF>>
        (storage order after inserting RLE ... PDF outside)
           <RLE>"FULLNAME"<PDF> <RLE><LOCAL at DOMAIN.TLD><PDF>
        
        (display order after)
           <DLT.NIAMOD at LACOL> "EMANLLUF"
       
	eg3) workaround with LRE ... PDF
        (storage order after inserting LRE ... PDF)
           "<LRE>FULLNAME<PDF>" <<LRE>LOCAL at DOMAIN.TLD<PDF>>
        (storage order after inserting LRE ... PDF outside)
           <LRE>"FULLNAME"<PDF> <LRE><LOCAL at DOMAIN.TLD><PDF>
        
        (display order after)
           "EMANLLUF" <DLT.NIAMOD at LACOL> 


(RLE/LRE)...(PDF) have many practical use in IDN/IMA identifier
display in the future.  When copy&pasted, those bidi control chars will
be included in the copy buffer and then fed into nameprep200X, at least
in Windows XP SP2.

You may test with this javascript samples if you have Hebrew chars fonts.

%u202a== LRE
%u202b== RLE
%u202c== PDF

<script>
document.write(unescape('eg1) "%u05d0%u05d1%u05d2" <%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db> '));
document.writeln("<br>");
document.write(unescape('eg2) %u202b"%u05d0%u05d1%u05d2"%u202c %u202b<%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db>%u202c '));
document.writeln("<br>");
document.write(unescape('eg3) %u202a"%u05d0%u05d1%u05d2"%u202c %u202a<%u05d3%u05d4%u05d5@%u05d6%u05d7%u05d8.%u05d9%u05da%u05db>%u202c '));
document.writeln("<br>");
</script>

Soobok


More information about the Idna-update mailing list