comments on draft-ietf-idnabis-bidi
Vint Cerf
vint at google.com
Tue Feb 10 15:36:39 CET 2009
thanks for these precise comments, Mati.
Harald, I hope you can assess and incorporate as appropriate into a
revised draft.
vint
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com
On Feb 10, 2009, at 3:28 AM, Matitiahu Allouche wrote:
>
> My attention was recently drawn to the subject document (version
> 03) and I have a number of comments. Some of them are very minor
> (typos, editorial) and reflect my pedantic mind, but I thought that
> I could as well help improve the form of the document. Other
> comments touch more to the essence, and I will appreciate
> considering them seriously.
>
> 1) In section 2, first paragraph, "satisifes" should be "satisfies".
>
> 2) Section 2, rule 1 mentions the "Character Grouping requirement"
> for the first time in the document. Either there should be a
> forward reference to section 3 where it will be explained, or
> (better, in my opinion), the content of the current section 3
> should precede the content of the current section 2.
>
> 3) In the sentence "ET is excluded because the string L ET does not
> satisfy the Character Grouping requirement.", "L" seems to
> represent a label, but can easily be confused with the L Bidi
> property (all the more since it is adjacent to ET which surely
> represents a character with the ET Bidi property).
>
> 4) In the sentence "CS is excluded because the string L CS does not
> satisfy the Character Grouping requirement.", "L" seems to
> represent a label, but can easily be confused with the L Bidi
> property (all the more since it is adjacent to CS which surely
> represents a character with the CS Bidi property).
>
> 5) I see no reason why CS is excluded while ES is allowed. Both
> can be the source of the same kind of violation of the Character
> Grouping requirement. ES characters are excluded from the first
> and last positions by rules 2 and 3. With the same restrictions
> (exclusion from the first and last positions), ES and ET characters
> can be allowed and will not violate the Character Grouping
> requirement any more than ES characters.
>
> 6) In section 1.1, there appears the following statement: "This
> specification is not intended to place any requirements on domain
> names that do not contain right-to-left characters."
> Also the title of section 2 is "A replacement for the RFC 3454 BIDI
> rule" which implies that the text only deals with "Bidi" labels.
> If that means that the specification applies only to labels which
> contain at least one character with Bidi property R, AL or AN, and
> we combine that with rule 4 "If an R, AL or AN is present, no L may
> be present.", then an L character can never be part of a Bidi
> label, and the L should be removed from the list of allowed Bidi
> properties in rule 1.
>
> 7) In [UAX9], rule X9 says that BN characters must be removed from
> the displayed text. Any such invisible character violates the
> Label Uniqueness requirement. BN characters must not be allowed by
> rule 1.
>
> 8) From rules 1, 2, 4, 6 and 7, plus our comments 6 and 7 above, it
> results that the first character of a Bidi label can only be of
> type R or AL. Such a statement can advantageously replace rules 2,
> 6 and 7.
>
> 9) Rule 5 includes no justification. While a mixture of AN and EN
> characters in the same label seems odd and not required in real
> life situations, it is not clear what requirement would be violated
> by such a combination.
>
> 10) The rules allow AN or EN digits to appear in the last position
> of a label (in opposition to RFC 3454). Let us consider the
> following examples (where lower case letters represent L characters
> and upper case letters represent R or AL characters):
>
> a. network order = "ABC123.456xyz" display order (LTR) =
> "123.456CBAxyz" display order (RTL) = "123.456xyzCBA"
>
> b. network order = "ABC.456-xyz" display order (LTR) = "456.CBA-
> xyz" display order (RTL) = "xyz-456.CBA"
>
> c. network order = "ABC123.456.xyz" display order (LTR) =
> "123.456CBA.xyz" display order (RTL) = "xyz.123.456CBA"
>
> d. network order = "ABC.456.xyz" display order (LTR) =
> "456.CBA.xyz" display order (RTL) = "xyz.456.CBA"
>
> Examples a, b and c show very ugly violations of the Character
> Grouping requirement. Since the document does not place
> requirements on non-Bidi labels, any non-Bidi label starting with
> digits following a Bidi label will cause a Character Grouping
> violation. If Bidi labels are restricted from ending with digits
> (optionally followed by NSMs), then non-Bidi labels which contain
> only digits (example d) following a Bidi label will not cause a
> Character Grouping violation.
> Whether this modest benefit justifies imposing such a restriction
> is subject to discussion.
>
> 11) Towards the end of section 2, there appears the following
> sentence: "In a domain name consisting of only labels that pass the
> test, the requirements of Section 3 are satisfied."
> This is not true for domain names like in the examples above,
> unless non-Bidi labels are excluded, which is a very hard constraint.
>
> 12) The next sentence says: "In a domain name consisting of only
> LDH-labels and labels that pass the test, the requirements of
> Section 3 are satisfied as long as a label that starts with an
> ASCII digit does not come after a right-to-left label that ends in
> a digit."
> This is not true. See example b above.
>
> 13) In section 3, there appears the sentence: "the label "123-456"
> will have a different display order in an RTL context than in a LTR
> context."
> This is not true, IMHO. If the last letter before the label is not
> an Arabic Letter, it will be displayed as "123-456" both in LTR and
> RTL context. If it is an Arabic Letter, it will be displayed as
> "456-123".
>
> 14) In section 3, there appears the sentence: "The Label Uniqueness
> property should hold true between LTR paragraphs and RTL
> paragraphs. This was shown to be unsound."
> In fact, in all cases where Character Grouping and Label Uniqueness
> are satisfied for each paragraph direction separately, there will
> be Label Uniqueness between LTR and RTL paragraphs.
>
> 15) In section 3, since an "unproblematic label" can be a label
> which satisfies the requirements, the clause "any label S1 and S2
> that is either a label satisfying the requirements or an
> unproblematic label" can be shortened to "any label S1 and S2 that
> is an unproblematic label".
>
> 16) In the formal statement of the Label Uniqueness requirement,
> there is no provision (or exclusion) for the case where L and L'
> are identical.
>
> 17) In summary I suggest that the rules in section 2 should be
> reformulated as below.
>
> 1. Only characters with the BIDI properties R, AL, AN, EN, ES,
> CS, ET, ON and NSM are allowed in RTL labels.
>
> 2. The first position must be a character with Bidi property R
> or AL.
>
> 3. The last position must be a character with Bidi property R or
> AL,
> followed by zero or more NSM.
>
> 3 variant. The last position must be a character with Bidi
> property R,
> AL, EN or AN, followed by zero or more NSM.
>
> 4 (debatable). If an EN is present, no AN may be present, and vice
> versa.
>
> It can be seen that this formulation is quite close to that in RFC
> 3454, while solving all the problems that the subject document aims
> to solve.
>
>
> Shalom (Regards), Mati
> Bidi Architect
> Globalization Center Of Competency - Bidirectional Scripts
> IBM Israel
> Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile:
> +972 52 2554160
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090210/6ed76aee/attachment.htm
More information about the Idna-update
mailing list