[lsb@lsb.org: [EAI] (summary) display of RightToLeft chars in localparts and hostnames]

Harald Alvestrand harald at alvestrand.no
Thu Dec 7 10:12:31 CET 2006


Soobok Lee wrote:
> On Thu, Dec 07, 2006 at 09:36:22AM +0100, Harald Alvestrand wrote:
>   
>>>   200E; LEFT-TO-RIGHT MARK
>>>   200F; RIGHT-TO-LEFT MARK
>>>
>>> My suggestion for new stringprep200x is to move these chars
>>>  to "mapped to nothing lists". that is, how about deleting silently
>>>  them instead of prohibiting them and returning error ?
>>>       
>> Any string that contains them will (one assumes) depend on their correct 
>> interpretation for correct display.
>>
>> Mapping them out and letting people use the resulting string powerfully 
>> violates the principle of least astonishment; if I, for reasons of my own, 
>> choose to send in the string (in network order) <RLO> D N A R T S E V L A 
>> <RLO>, expecting to see the display ALVESTRAND, I will be astonished if the 
>> result is DNARTSEVLA.
>>
>> I'll be even more surprised if someone is able to register 
>> <RLO>DNARTSEVLA<LRO>.com and use that in a phishing attack on 
>> alvestrand.com - returning an error message is IMHO Exactly The Right Thing 
>> To Do.
>>     
>
> Thanks for your correction. Just deleting is NOT the right answer. My thought
> was somewhat short about that. :0
>
> My new suggestion is that: stringprep processes
>   <RLE>D N A R T S E V L A<PDF> ==> ALVESTRAND 
>   <LRE>YOD HE WOW HE<PDF> ==> HE WOW HE YOD ( in Hebrew)
>   instead of just deleting or prohibiting <RLE> and <LRE>.
>
> How do you think about this "Just delete with reordering"?
> It won't complicate stringprep algorithms so much.
I suppose it's possible to execute the whole bidi algorithm of UAX#9 and 
re-code the result as some kind of "normalized RTL". Is there a 
normalization algorithm for bidi in Unicode?

But I don't see that it's reasonable to expect EVERY IDNA implementation 
to do this - complexity is WAY higher than for many other things.

If we make a clear separation between "allowed characters on the wire" 
and "advice to implementors on how they can help people recover from 
weird-encoding errors", this may go into the latter part.

                 Harald



More information about the Idna-update mailing list