Comments on IDNA Bidi

Harald Alvestrand harald at alvestrand.no
Mon Jan 14 22:30:31 CET 2008


Kenneth Whistler skrev:
>> If I read it correctly, you're saying that the string "A" in a LTR
>> paragraph has sor and eor set to L, but in the string "A<RLE>B<PDF>C",
>> the levels will be 0 1 0, and the sor and eor for the run containing "B"
>> will be RTL by X10.
>>
>> In the string "A<RLE>B<LRE>C<PDF><PDF>", the levels will be 0 1 2, and
>> the sor of B will be RTL, and the eor of B will be LTR.
>>     
>
> Yes, but as I indicated, all the explicit embedding stuff is
> irrelevant. All those codes are categorically ruled out by
> RFC 3454, and I don't think anything we are proposing allows
> them back in.
>   
It's only irrelevant if we can ban LRE and RLE *around* the domain
names, as well as within them.
>> So again - which combinations of "sor" and "eor" values do you think we
>> should test for in the test described above?
>>     
>
> As above. They are defined by the test context.
>
> <sor> C D 1 ALEF A 2 BET E F <eor>
>
> <sor> GIMEL DALET ALEF 1  A 2  BET HE VAV <eor>
>
> Those *are* the paragraphs. The paragraph embedding levels are
> defined (according to P2) by the "C" in the first case and
> the "GIMEL" in the second case. And you derive the Bidi_Class
> of the <sor> and <eor> in each case directly from the levels
> of the runs. And in each case there is only one run, because
> we aren't allowing explicit embeddings and overrides.
What I get from this statement is that the paragraph

<sor>this is a <rle>ABC<pdf> domain.NAME<eor>

is a case that we don't need to test for. People who do that will get
weird results, but it's not a problem.

Is that what you are saying?

It makes life a LOT simpler if this is OK. Not quite as simple as
IDNA2003 assumed, but a lot simpler.

                        Harald



More information about the Idna-update mailing list