comments on draft-ietf-idnabis-bidi

Tue Feb 10 09:28:19 CET 2009

My attention was recently drawn to the subject document (version 03) and I 
have a number of comments.  Some of them are very minor (typos, editorial) 
and reflect my pedantic mind, but I thought that I could as well help 
improve the form of the document.  Other comments touch more to the 
essence, and I will appreciate considering them seriously.

1) In section 2, first paragraph, "satisifes" should be "satisfies".

2) Section 2, rule 1 mentions the "Character Grouping requirement" for the 
first time in the document.  Either there should be a forward reference to 
section 3 where it will be explained, or (better, in my opinion), the 
content of the current section 3 should precede the content of the current 
section 2.

3) In the sentence "ET is excluded because the string L ET does not 
satisfy the Character Grouping requirement.", "L" seems to represent a 
label, but can easily be confused with the L Bidi property (all the more 
since it is adjacent to ET which surely represents a character with the ET 
Bidi property).

4) In the sentence "CS is excluded because the string L CS does not 
satisfy the Character Grouping requirement.", "L" seems to represent a 
label, but can easily be confused with the L Bidi property (all the more 
since it is adjacent to CS which surely represents a character with the CS 
Bidi property).

5) I see no reason why CS is excluded while ES is allowed.  Both can be 
the source of the same kind of  violation of the Character Grouping 
requirement.  ES characters are excluded from the first and last positions 
by rules 2 and 3.  With the same restrictions (exclusion from the first 
and last positions), ES and ET characters can be allowed and will not 
violate the Character Grouping requirement any more than ES characters.

6) In section 1.1, there appears the following statement: "This 
specification is not intended to place any requirements on domain names 
that do not contain right-to-left characters."
Also the title of section 2 is "A replacement for the RFC 3454 BIDI rule" 
which implies that the text only deals with "Bidi" labels.
If that means that the specification applies only to labels which contain 
at least one character with Bidi property R, AL or AN, and we combine that 
with rule 4 "If an R, AL or AN is present, no L may be present.", then an 
L character can never be part of a Bidi label, and the L should be removed 
from the list of allowed Bidi properties in rule 1.

7) In [UAX9], rule X9 says that BN characters must be removed from the 
displayed text.  Any such invisible character violates the Label 
Uniqueness requirement.  BN characters must not be allowed by rule 1.

8) From rules 1, 2, 4, 6 and 7, plus our comments 6 and 7 above, it 
results that the first character of a Bidi label can only be of type R or 
AL.  Such a statement can advantageously replace rules 2, 6 and 7.

9) Rule 5 includes no justification.  While a mixture of AN and EN 
characters in the same label seems odd and not required in real life 
situations, it is not clear what requirement would be violated by such a 
combination.

10) The rules allow AN or EN digits to appear in the last position of a 
label (in opposition to RFC 3454).  Let us consider the following examples 
(where lower case letters represent L characters and upper case letters 
represent R or AL characters):

   a. network order = "ABC123.456xyz"  display order (LTR) = 
"123.456CBAxyz"  display order (RTL) = "123.456xyzCBA"

   b. network order = "ABC.456-xyz"  display order (LTR) = "456.CBA-xyz" 
display order (RTL) = "xyz-456.CBA"

   c. network order = "ABC123.456.xyz"  display order (LTR) = 
"123.456CBA.xyz"  display order (RTL) = "xyz.123.456CBA"

   d. network order = "ABC.456.xyz"  display order (LTR) = "456.CBA.xyz" 
display order (RTL) = "xyz.456.CBA"

Examples a, b and c show very ugly violations of the Character Grouping 
requirement.  Since the document does not place requirements on non-Bidi 
labels, any non-Bidi label starting with digits following a Bidi label 
will cause a Character Grouping violation.  If Bidi labels are restricted 
from ending with digits (optionally followed by NSMs), then non-Bidi 
labels which contain only digits (example d) following a Bidi label will 
not cause a Character Grouping violation.
Whether this modest benefit justifies imposing such a restriction is 
subject to discussion.

11) Towards the end of section 2, there appears the following sentence: "
In a domain name consisting of only labels that pass the test, the 
requirements of Section 3 are satisfied."
This is not true for domain names like in the examples above, unless 
non-Bidi labels are excluded, which is a very hard constraint.

12) The next sentence says: "In a domain name consisting of only 
LDH-labels and labels that pass the test, the requirements of Section 3 
are satisfied as long as a label that starts with an ASCII digit does not 
come after a right-to-left label that ends in a digit."
This is not true.  See example b above.

13) In section 3, there appears the sentence: "the label "123-456" will 
have a different display order in an RTL context than in a LTR context."
This is not true, IMHO.  If the last letter before the label is not an 
Arabic Letter, it will be displayed as "123-456" both in LTR and RTL 
context.  If it is an Arabic Letter, it will be displayed as "456-123".

14) In section 3, there appears the sentence: "The Label Uniqueness 
property should hold true between LTR paragraphs and RTL paragraphs.  This 
was shown to be unsound."
In fact, in all cases where Character Grouping and Label Uniqueness are 
satisfied for each paragraph direction separately, there will be Label 
Uniqueness between LTR and RTL paragraphs.

15) In section 3, since an "unproblematic label" can be a label which 
satisfies the requirements, the clause "any label S1 and S2 that is either 
a label satisfying the requirements or an unproblematic label" can be 
shortened to "any label S1 and S2 that is an unproblematic label".

16) In the formal statement of the Label Uniqueness requirement, there is 
no provision (or exclusion) for the case where L and L' are identical.

17) In summary I suggest that the rules in section 2 should be 
reformulated as below.

   1.  Only characters with the BIDI properties R, AL, AN, EN, ES,
       CS, ET, ON and NSM are allowed in RTL labels.

   2.  The first position must be a character with Bidi property R or AL.

   3.  The last position must be a character with Bidi property R or AL,
       followed by zero or more NSM.

   3 variant.  The last position must be a character with Bidi property R,
     AL, EN or AN, followed by zero or more NSM.

   4 (debatable).  If an EN is present, no AN may be present, and vice
      versa.

It can be seen that this formulation is quite close to that in RFC 3454, 
while solving all the problems that the subject document aims to solve.

Shalom (Regards),  Mati
           Bidi Architect
           Globalization Center Of Competency - Bidirectional Scripts
           IBM Israel
           Phone: +972 2 5888802    Fax: +972 2 5870333    Mobile: +972 52 
2554160
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090210/4126b91a/attachment.htm