my comments on draft-ietf-idnabis-bidi-05

Tue Sep 1 10:20:10 CEST 2009

Abstract: Should mention bidi rules first, then changes (this has been 
fixed in the document itself, which is great).

Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are 
archival documents.

1.1, para 2: "When labels satisfy the rule, and when certain other 
conditions are satisfied, they can be used with a minimal chance of 
these labels being displayed in a confusing way by a bidirectional 
display algorithm.": "they" .. "these labels" is confusing. What about
"When labels satisfy the rule, and when certain other conditions are 
satisfied, there is only a minimal chance that these labels will be 
displayed in a confusing way by a bidirectional display algorithm."

1.1: "A bidirectional display algorithm": How many of them do we have? 
(I only know one, the Unicode one (with some minor variants)). How many 
of them have been used for testing/verification?

1.1, para 3: what exactly is a "right-to-left character"?

1.2: This section ideally should also be moved to after Section 2.

1.2, para 1: "The IDNA specification "Stringprep"": change to something 
like "Stringprep, part of IDNA2003". Otherwise, it's not clear that this 
is an old spec.

1.2, para 4: "However, this makes certain words" -> "However, this made 
certain words" (past tense)

1.2, para 7: "While the document specifies rules" -> "While this 
document specifies rules"

1.2, para 7: "(the most important being label that mix Arabic and 
European digits (AN and EN) inside an RTL label, and labels that use AN 
in an LTR label)": Very weird. Such cases may not be completely 
impossible, but they are much less frequent than e.g. Arabic numbers 
inside Arabic letters, European numbers inside Arabic letters, and so 
on. There was even a strong movement to prohibit number mixing at the 
protocol level; this would never have happened if such mixing would have 
been deemed to be "most important". Also, after looking at the actual 
conditions, we either have an RTL label, which by condition 4 excludes 
mixing EN and AN, or we have an LTR label, which by condition 5 excludes 
AN and therefore the mixture of EN and AN.

1.3, title: "Layout" -> "Structure" or "Organization"

1.3, para 1: Change from "bidi test" to "bidi rule". (or unify otherwise)

1.3, para 1: ", that" -> ", which"

1.3, para 1: "no matter what the direction of the label is": What does 
this mean? It could either mean that you can apply the test forwards or 
backwards, or it could mean that it doesn't depend on what 
directionality the characters in the label have, or whatever. In the 
later case, I'd write e.g.: "This test [->rule, see above and below] can 
be applied to any kind of label, but becomes trivial if the input is 
guaranteed to contain only LTR characters."

1.3: "The primary initial use of that test": "that test" -> "this test" 
(this sentence talks about relationship with other documents, so it's 
the test in this document, not the test in that other section)

1.3, para 2: "a BIDI rule" -> "the BIDI rule"

1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are 
talking about document organization, so it's "the rule in that other 
section over there", so "here" doesn't fit)

1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to Section 
7 describe": Section 8 is IANA consideration.

1.4: I have no problem following this stuff because I have worked on 
bidi earlier, but somebody who's not familiar with BIDI will encur a 
very steep learning courve. Either help a bit more with e.g. sentences 
such as "for the purposes of bidirectional layout, each Unicode 
character is assingned a BIDI property value."

1.4: "non spacing" -> "nonspacing"

1.4, "The directionality of such examples" -> "The display order of such 
examples"

1.4, "it means ..., approximately" -> "it approximately means"

1.4 "An RTL label": This seems to be the definition that Protocol might 
want to refer to.

1.4 'Having a separate category of "RTL domain names" would not make 
this specification simpler, so has not been done.' -> 'Providing a 
separate category of "RTL domain names" would not make this 
specification simpler.'

Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are 
used, that's confusing. The term is always in singular. The document 
works that way in general, but "The following test" at the start of 
Section 2 is confusing, because the only 'tests' that one can see are 
the ones labeled 1. to 6. Maybe use something like "In order to pass the 
BIDI test, the following conditions 1. to 6. must all be satisfied."

2, conditions 2/4: Why are BN (control characters) allowed in RTL but 
not in LTR?

3. "A requirement" -> "The requirement" (see above)

3., para 2: As this restricts things to the Unicode bidi algorithm, 
please say this earlier. (see above)

3., para 3: "requirements proposed" -> "requirements" (we are working on 
finalizing this document, we are no longer in the proposal stage)

3., requirement 2: Is the choice of 'characters delimiting the labels' 
open, is this only the ASCII dot, is this a small set (I'm interested in 
this both for spec clarity and because the answer might strongly affect 
draft-duerst-iri-bis).

3, 'possible requirement' related to directionality controls:
"(outside of the labels)" -> "(outside, but potentially directly 
adjacent of the labels)" (does this include cases with directionality 
controls inside a domain name, i.e. before/after a dot?)
"the conditions above require extra testing" -> "the conditions above 
required extra testing"

3., 'Delimiterchars': FULL STOP not allowed in domain names?????

4.1, para 1: "This marking is obligatory, and both double vowels and 
syllable-final consonants are indicated by the marking of special 
unvoiced characters." -> "This marking is obligatory, and syllable-final 
consonants are indicated a special unvoiced character."
(double (long) vowels are indicated in Unicode by their own combining 
mark, which is of course voiced. These are graphically in most cases 
just duplications of the single (short) vowels. The current text 
suggests a special "duplicate the proceeding vowel" sing similar to the 
one (sukun) for consonants, but such a suggestion is wrong.)

4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"

4.2: This section could be shortened considerably. "Greater latitude 
here than ... Dhivehi." is irrelevant; as long as a significant part of 
a language's words cannot be used in IDN, there's a problem. The 
subsection is interesting for people interested in Yiddish, but the 
average reader of the spec will try to find something relevant for the 
algorithm, and mostly be more confused than enlightened.

4.3: "(with the 5 being considered right-to-left because of the leading 
ALEF)": No, the 5 itself is never right-to-left. Change to "(the overall 
directionality being right-to-left because of the leading ALEF)"

4.3: "but barring them both seems to require justification" -> "but 
barring them both seems unnecessary" or "but barring them both turned 
out to be unnecessary"

5. "Even if a label is registered under a "safe" label,": 'under' should 
be explained more clearly (I assume this refers to the hierarchical 
relationship in the DNS)

5., last paragraph: It would be better to change this into a SHOULD, 
such as "Where implementations see a a way to avoid ..., they SHOULD 
avoid". That will bring this issue on the radar screen of implementers, 
whereas it currently will just be glossed over.

6., first paragraph: "All other issues with these scripts": What scripts???

6. "wishes to create rules for the mixing of digits" -> "wishes to 
create rules against the mixing of digits" or "wishes to restrict the 
mixing of digits"

6. "Rules are also specified at the protocol level, but while the 
example above involves right-to-left characters, this is not inherently 
a BIDI problem." -> "This example is not inherently a BIDI problem, so 
such restrictions are not specified at the protocol level."
("Rules are also specified at the protocol level" is inherently vague; 
it seems to mean "Some rules against mixing digits are also specified at 
the protocol level, but only when this is necessary to avoid a BIDI 
problem.")

6. "It is unrealistic to expect that applications will display domain 
names using embedded formatting codes between their labels (for one 
thing, no reliable algorithms for identifying domain names in running 
text exist);": Please add that it is also unrealistic that formatting 
codes are removed before IDNA processing, and that allowing formatting 
codes could lead to many kinds of 'mischief' that would go against the 
two requirements in section 3.

6. "which might surprise someone expecting to see labels displayed in 
hierarchical order.": Please add that this may not be such a big problem 
to general users familiar with BIDI, because they are used to 
seeing/reading a sequuence of RTL units (e.g. words) from right to left.
(for wording alternatives, see 
http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second 
para*, ...)

7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really 
farfetched (not impossible just because there is no guarantee against 
weird implementations). It would be good to indicate that somehow.
(this includes the paragraph following bullet point 3)

7.1: "The editors believe": change to something less specific; this is a 
WG document, we either have rough consensus or we don't. (I for one 
fully agree with this point)

7.2: This should be slightly reworded to more clearly send the message 
that changes to Unicode bidi properties, while not totally impossible, 
are expected to be rare, and to affect mostly symbols and the like, 
which will limit their effect on what the BIDI rule(/test) allows and 
what not.

8. "It is possible that differences in the interpretation of the 
specification": Wrong. There are no differences in interpretation for 
the old spec. There are no differences in the interpretation of the new 
spec. There are differences in the specs themselves.

Regards,   Martin.

P.S.: Unfortunately, I will not have time to review the remaining 
documents (tables, rationale) during last call (or this week).

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp