Comments on IDNA Bidi

Mark Davis mark.davis at icu-project.org
Thu Jan 10 01:25:14 CET 2008


I sent this almost a month ago, and got no reply. I'm assuming that the lack
of response was due to the holidays, and some discussion or response for
these items will be forthcoming soon.

Mark

On Dec 13, 2007 7:43 PM, Mark Davis <mark.davis at icu-project.org> wrote:

> I've collected together comments on the four documents, and tried to
> organize them for reference. Here is the first set.
>
> http://www.ietf.org/internet-drafts/draft-alvestrand-idna-bidi-01.txt
>
>
> Overall comments:
>
>
> Well documented, with clear examples justifying the problems to be solved.
>
>
>
> Details:
>
> Bidi-1.
>
>    Note that Unicode 5.0 is the current version of Unicode.  This fix
>    refers to Unicode 3.2 only, to maintain consistency with the rest of
>    RFC 3454.  Nothing here should affect the relationship between
>    Unicode versions and IDNA.
>
> But making it specific to U3.2 *does* tie it to a particular version. Is
> the intention for this to modify IDNA2003 before IDNAbis comes out? That
> doesn't seem to be the case for the rest of the documents. Better would be
> for it to refer to the version of Unicode used by IDNA (whatever version it
> is).
>
>
>  In the same vein, tying the comment to RFC 3454 is limiting as the
> solution that the document is proposing is in the context of IDNA-bis which
> does away with stringprep/nameprep. Overall the document should take a more
> generic view for solution, not just stringprep (RFC 3454) specific.
>
>
> Bidi-2.
>
>    The following conditions MUST be true in both resulting strings for
>    the string to be acceptable:
>
>    o  The leftmost and rightmost character of the resulting string in
>       display order must be a full stop (U+002E)
>
>    o  No non-spacing mark (NSM) can occur in the second position of the
>       string (leftmost in L order, rightmost in R order); that is, no
>       mark can be allowed to attach to the delimiting characters.
>
>    o  The direction of the leftmost and rightmost characters in the
>       string (the periods) must be either L or R
>
> The NSM condition should be part of the main IDNA conditions, not here.
>
>
>  Bidi-2a.
>
>
>  If you really want a test, it would be something like the following:
>
>
>
>    1. At build time, produce a test set T of characters, one from each
>    of the BIDI classes where a character can be in IDNA (eg excluding B, LRE/O,
>    RLE/O, and PDF). That is, roughly 14 characters.
>    2. To test a given prospective label L, perform the following over
>    all possible 2 characters strings X and Y from T. (That is, this would be
>    14^4 iterations.)
>    3. Create the the string S formed from: X + L+ Y
>    4. Apply the BIDI algorithm to S twice, once with a RTL and once
>    with LTR paragraph
>    directions.
>    5. If in the result and of the characters in the label are separated
>    by a character
>    from X or Y, the test fails.
>
>
> However, this should really not be proposed as something that users of
> IDNA should do. Instead, it should be used to test that Michel's formulation
> is correct.
>
> Bidi-3.
>
>    We believe that there is a clear likelihood of similar issues
>    existing with other scripts and languages that are not currently used
>    extensively with IDNs.  Careful consideration of all the languages
>    written in a given script, in consultation with all of the
>    corresponding speech communities, is therefore needed before we can
>    say with any degree of certainty that using that script for IDNs is
>    unproblematic.
>
> This is not a bidi issue, and should be in a different document. (See
> other comments about "speech communities")
>
>
> Bidi-4.
>
>    Another set of issues concerns the proper display of IDNs with a
>    mixture of LTR and RTL labels, or only RTL labels; it is not clear to
>    these authors what the proper display order of the components of a
>    domain name are if the directiion of the components (in network
>    order) is, for instance, FirstRTL.SecondRTL.LTR - is it
>    LTRtsriF.LTRdnoceS.LTR or LTRdnoceS.LTRtsrif.LTR?  Again, this memo
>    does not attempt to suggest a solution to this problem.
>
> If the question is: what does the BIDI algorithm do in such cases, the
> answer is easy to determine. If the question is whether a user agent should
> display a URL in a different order than the BIDI algorithm, I think that's
> beyond the scope of this document. Note that any attempt to have it display
> differently requires all text processors to recognize URLs and handle them
> specially, with problems of interoperability and confusion when, inevitably,
> most of them fail. So recommending a non-standard display will probably do
> more harm than good.
>
> Bidi-5.
>
>    One particular example of the last case is if a program chooses to
>    examine the last character (in network order) of a string in order to
>    determine its directionality, rather than its first; if it finds an
>    NSM character and tries to display the string as if it was a left-to-
>    right string, the resulting display may be interesting, but not
>    useful.
>
> I don't understand this paragraph. When and why would this happen with
> IDNA-conformant programs?
>
>


-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080109/e32911cb/attachment.html


More information about the Idna-update mailing list