Comments on IDNA Bidi

Mark Davis mark.davis at icu-project.org
Fri Dec 14 04:43:52 CET 2007


I've collected together comments on the four documents, and tried to
organize them for reference. Here is the first set.

http://www.ietf.org/internet-drafts/draft-alvestrand-idna-bidi-01.txt


Overall comments:


Well documented, with clear examples justifying the problems to be solved.


Details:

Bidi-1.

   Note that Unicode 5.0 is the current version of Unicode.  This fix
   refers to Unicode 3.2 only, to maintain consistency with the rest of
   RFC 3454.  Nothing here should affect the relationship between
   Unicode versions and IDNA.

But making it specific to U3.2 *does* tie it to a particular version. Is the
intention for this to modify IDNA2003 before IDNAbis comes out? That doesn't
seem to be the case for the rest of the documents. Better would be for it to
refer to the version of Unicode used by IDNA (whatever version it is).


 In the same vein, tying the comment to RFC 3454 is limiting as the solution
that the document is proposing is in the context of IDNA-bis which does away
with stringprep/nameprep. Overall the document should take a more generic
view for solution, not just stringprep (RFC 3454) specific.


Bidi-2.

   The following conditions MUST be true in both resulting strings for
   the string to be acceptable:

   o  The leftmost and rightmost character of the resulting string in
      display order must be a full stop (U+002E)

   o  No non-spacing mark (NSM) can occur in the second position of the
      string (leftmost in L order, rightmost in R order); that is, no
      mark can be allowed to attach to the delimiting characters.

   o  The direction of the leftmost and rightmost characters in the
      string (the periods) must be either L or R

The NSM condition should be part of the main IDNA conditions, not here.


 Bidi-2a.


 If you really want a test, it would be something like the following:



   1. At build time, produce a test set T of characters, one from each of
   the BIDI classes where a character can be in IDNA (eg excluding B, LRE/O,
   RLE/O, and PDF). That is, roughly 14 characters.
   2. To test a given prospective label L, perform the following over all
   possible 2 characters strings X and Y from T. (That is, this would be 14^4
   iterations.)
   3. Create the the string S formed from: X + L+ Y
   4. Apply the BIDI algorithm to S twice, once with a RTL and once with
   LTR paragraph
   directions.
   5. If in the result and of the characters in the label are separated
   by a character
   from X or Y, the test fails.


However, this should really not be proposed as something that users of IDNA
should do. Instead, it should be used to test that Michel's formulation is
correct.

Bidi-3.

   We believe that there is a clear likelihood of similar issues
   existing with other scripts and languages that are not currently used
   extensively with IDNs.  Careful consideration of all the languages
   written in a given script, in consultation with all of the
   corresponding speech communities, is therefore needed before we can
   say with any degree of certainty that using that script for IDNs is
   unproblematic.

This is not a bidi issue, and should be in a different document. (See other
comments about "speech communities")


Bidi-4.

   Another set of issues concerns the proper display of IDNs with a
   mixture of LTR and RTL labels, or only RTL labels; it is not clear to
   these authors what the proper display order of the components of a
   domain name are if the directiion of the components (in network
   order) is, for instance, FirstRTL.SecondRTL.LTR - is it
   LTRtsriF.LTRdnoceS.LTR or LTRdnoceS.LTRtsrif.LTR?  Again, this memo
   does not attempt to suggest a solution to this problem.

If the question is: what does the BIDI algorithm do in such cases, the
answer is easy to determine. If the question is whether a user agent should
display a URL in a different order than the BIDI algorithm, I think that's
beyond the scope of this document. Note that any attempt to have it display
differently requires all text processors to recognize URLs and handle them
specially, with problems of interoperability and confusion when, inevitably,
most of them fail. So recommending a non-standard display will probably do
more harm than good.

Bidi-5.

   One particular example of the last case is if a program chooses to
   examine the last character (in network order) of a string in order to
   determine its directionality, rather than its first; if it finds an
   NSM character and tries to display the string as if it was a left-to-
   right string, the resulting display may be interesting, but not
   useful.

I don't understand this paragraph. When and why would this happen with
IDNA-conformant programs?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20071213/e40265e0/attachment.html


More information about the Idna-update mailing list