Comments on IDNA Bidi
Mark Davis
mark.davis at icu-project.org
Fri Dec 14 04:43:52 CET 2007
I've collected together comments on the four documents, and tried to
organize them for reference. Here is the first set.
http://www.ietf.org/internet-drafts/draft-alvestrand-idna-bidi-01.txt
Overall comments:
Well documented, with clear examples justifying the problems to be solved.
Details:
Bidi-1.
Note that Unicode 5.0 is the current version of Unicode. This fix
refers to Unicode 3.2 only, to maintain consistency with the rest of
RFC 3454. Nothing here should affect the relationship between
Unicode versions and IDNA.
But making it specific to U3.2 *does* tie it to a particular version. Is the
intention for this to modify IDNA2003 before IDNAbis comes out? That doesn't
seem to be the case for the rest of the documents. Better would be for it to
refer to the version of Unicode used by IDNA (whatever version it is).
In the same vein, tying the comment to RFC 3454 is limiting as the solution
that the document is proposing is in the context of IDNA-bis which does away
with stringprep/nameprep. Overall the document should take a more generic
view for solution, not just stringprep (RFC 3454) specific.
Bidi-2.
The following conditions MUST be true in both resulting strings for
the string to be acceptable:
o The leftmost and rightmost character of the resulting string in
display order must be a full stop (U+002E)
o No non-spacing mark (NSM) can occur in the second position of the
string (leftmost in L order, rightmost in R order); that is, no
mark can be allowed to attach to the delimiting characters.
o The direction of the leftmost and rightmost characters in the
string (the periods) must be either L or R
The NSM condition should be part of the main IDNA conditions, not here.
Bidi-2a.
If you really want a test, it would be something like the following:
1. At build time, produce a test set T of characters, one from each of
the BIDI classes where a character can be in IDNA (eg excluding B, LRE/O,
RLE/O, and PDF). That is, roughly 14 characters.
2. To test a given prospective label L, perform the following over all
possible 2 characters strings X and Y from T. (That is, this would be 14^4
iterations.)
3. Create the the string S formed from: X + L+ Y
4. Apply the BIDI algorithm to S twice, once with a RTL and once with
LTR paragraph
directions.
5. If in the result and of the characters in the label are separated
by a character
from X or Y, the test fails.
However, this should really not be proposed as something that users of IDNA
should do. Instead, it should be used to test that Michel's formulation is
correct.
Bidi-3.
We believe that there is a clear likelihood of similar issues
existing with other scripts and languages that are not currently used
extensively with IDNs. Careful consideration of all the languages
written in a given script, in consultation with all of the
corresponding speech communities, is therefore needed before we can
say with any degree of certainty that using that script for IDNs is
unproblematic.
This is not a bidi issue, and should be in a different document. (See other
comments about "speech communities")
Bidi-4.
Another set of issues concerns the proper display of IDNs with a
mixture of LTR and RTL labels, or only RTL labels; it is not clear to
these authors what the proper display order of the components of a
domain name are if the directiion of the components (in network
order) is, for instance, FirstRTL.SecondRTL.LTR - is it
LTRtsriF.LTRdnoceS.LTR or LTRdnoceS.LTRtsrif.LTR? Again, this memo
does not attempt to suggest a solution to this problem.
If the question is: what does the BIDI algorithm do in such cases, the
answer is easy to determine. If the question is whether a user agent should
display a URL in a different order than the BIDI algorithm, I think that's
beyond the scope of this document. Note that any attempt to have it display
differently requires all text processors to recognize URLs and handle them
specially, with problems of interoperability and confusion when, inevitably,
most of them fail. So recommending a non-standard display will probably do
more harm than good.
Bidi-5.
One particular example of the last case is if a program chooses to
examine the last character (in network order) of a string in order to
determine its directionality, rather than its first; if it finds an
NSM character and tries to display the string as if it was a left-to-
right string, the resulting display may be interesting, but not
useful.
I don't understand this paragraph. When and why would this happen with
IDNA-conformant programs?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20071213/e40265e0/attachment.html
More information about the Idna-update
mailing list