Bidi simplification (Re: IDNA protocol checking/processing)

Harald Tveit Alvestrand harald at alvestrand.no
Mon Dec 3 19:30:35 CET 2007


--On 3. desember 2007 09:47 -0800 Michel Suignard 
<michelsu at windows.microsoft.com> wrote:

>> But I believe that the correct context for that discussion is the
>> bidi document, not the protocol document - if we can get bidi
>> right, the protocol document should have absolutely no need to do
>> anything but refer to it.
>
> Harald, I respectfully disagree, at least in the current forms of the two
> documents. As of now, the bidi document refers to a change in RFC 3454
> (stringprep) while the protocol document does w/o any external string
> preparation step, so it seems difficult to treat the bidi document as an
> appropriate referenced document. At minimum you should update the bidi
> document to reflect the new protocol document and be a proper reference.
> You should probably insert a new clause between 4 and 5 detailing the
> appropriate bidi test step to be used by the new idn protocol (ref clause
> 4.4 of the protocol document). Note that this message as well as the
> previous was taking into consideration both your document and the
> protocol document.
>
> I also saw the bidi document as a problem statement with a suggested
> solution in an external protocol document (RFC 3454 or successor). If the
> solution can be expressed in simple terms I don't see why the solution
> cannot be explicitly part of the protocol with a link to the bidi
> document for rationale. I don't see an issue with my proposed text be
> part of both the rationale (bidi document) and the protocol.

thanks - I have now scanned the relevant section (it was mercifully short).

By "proposed text", do you mean section 7 of your document?
The operative text is:

    a.
        The string MUST NOT contain any "RCat" character,
    b.
        Or if it does, the string must satisfy all of these requirements

           1. The string MUST NOT contain any "LCat" character,
           2. The string MUST start with an "RCat" character,
           3. The string MUST either end with an "RCat" character, or end 
with an
              "RCat" character followed by a sequence of "NSMCat" 
characters.

Just to be sure I don't miss anything...

> The bidi solution that I had extracted in my previous message was part of
> a larger document I had sent to this group on March 12th 2007, well
> before your last version of the bidi document at
> http://www.unicode.org/~suignard/udraft-suignard-idnaprep-00.html and was
> addressing roughly the same scope as the new protocol document. I have no
> intend to make it into a competitive proposal, but would like some of its
> principles incorporated in a final protocol document. I also got very
> little feedback on my own earlier contribution so your 'unhappiness' for
> lack of comments is somehow shared ;-)

I'll work on it (and try to verify that the given rule indeed will satisfy 
the "no rearrangement across boundaries" constraint - I've had trouble 
writing code that actually tests this requirement - my attempted test runs 
have seemed to indicate that if you embed a label between an LTR character 
and a RTL character, most strings will violate the constraint).

Thanks!

               Harald



More information about the Idna-update mailing list