my comments on draft-ietf-idnabis-bidi-05
"Martin J. Dürst"
duerst at it.aoyama.ac.jp
Tue Sep 1 10:20:10 CEST 2009
Abstract: Should mention bidi rules first, then changes (this has been
fixed in the document itself, which is great).
Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are
1.1, para 2: "When labels satisfy the rule, and when certain other
conditions are satisfied, they can be used with a minimal chance of
these labels being displayed in a confusing way by a bidirectional
display algorithm.": "they" .. "these labels" is confusing. What about
"When labels satisfy the rule, and when certain other conditions are
satisfied, there is only a minimal chance that these labels will be
displayed in a confusing way by a bidirectional display algorithm."
1.1: "A bidirectional display algorithm": How many of them do we have?
(I only know one, the Unicode one (with some minor variants)). How many
of them have been used for testing/verification?
1.1, para 3: what exactly is a "right-to-left character"?
1.2: This section ideally should also be moved to after Section 2.
1.2, para 1: "The IDNA specification "Stringprep"": change to something
like "Stringprep, part of IDNA2003". Otherwise, it's not clear that this
is an old spec.
1.2, para 4: "However, this makes certain words" -> "However, this made
certain words" (past tense)
1.2, para 7: "While the document specifies rules" -> "While this
document specifies rules"
1.2, para 7: "(the most important being label that mix Arabic and
European digits (AN and EN) inside an RTL label, and labels that use AN
in an LTR label)": Very weird. Such cases may not be completely
impossible, but they are much less frequent than e.g. Arabic numbers
inside Arabic letters, European numbers inside Arabic letters, and so
on. There was even a strong movement to prohibit number mixing at the
protocol level; this would never have happened if such mixing would have
been deemed to be "most important". Also, after looking at the actual
conditions, we either have an RTL label, which by condition 4 excludes
mixing EN and AN, or we have an LTR label, which by condition 5 excludes
AN and therefore the mixture of EN and AN.
1.3, title: "Layout" -> "Structure" or "Organization"
1.3, para 1: Change from "bidi test" to "bidi rule". (or unify otherwise)
1.3, para 1: ", that" -> ", which"
1.3, para 1: "no matter what the direction of the label is": What does
this mean? It could either mean that you can apply the test forwards or
backwards, or it could mean that it doesn't depend on what
directionality the characters in the label have, or whatever. In the
later case, I'd write e.g.: "This test [->rule, see above and below] can
be applied to any kind of label, but becomes trivial if the input is
guaranteed to contain only LTR characters."
1.3: "The primary initial use of that test": "that test" -> "this test"
(this sentence talks about relationship with other documents, so it's
the test in this document, not the test in that other section)
1.3, para 2: "a BIDI rule" -> "the BIDI rule"
1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are
talking about document organization, so it's "the rule in that other
section over there", so "here" doesn't fit)
1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to Section
7 describe": Section 8 is IANA consideration.
1.4: I have no problem following this stuff because I have worked on
bidi earlier, but somebody who's not familiar with BIDI will encur a
very steep learning courve. Either help a bit more with e.g. sentences
such as "for the purposes of bidirectional layout, each Unicode
character is assingned a BIDI property value."
1.4: "non spacing" -> "nonspacing"
1.4, "The directionality of such examples" -> "The display order of such
1.4, "it means ..., approximately" -> "it approximately means"
1.4 "An RTL label": This seems to be the definition that Protocol might
want to refer to.
1.4 'Having a separate category of "RTL domain names" would not make
this specification simpler, so has not been done.' -> 'Providing a
separate category of "RTL domain names" would not make this
Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are
used, that's confusing. The term is always in singular. The document
works that way in general, but "The following test" at the start of
Section 2 is confusing, because the only 'tests' that one can see are
the ones labeled 1. to 6. Maybe use something like "In order to pass the
BIDI test, the following conditions 1. to 6. must all be satisfied."
2, conditions 2/4: Why are BN (control characters) allowed in RTL but
not in LTR?
3. "A requirement" -> "The requirement" (see above)
3., para 2: As this restricts things to the Unicode bidi algorithm,
please say this earlier. (see above)
3., para 3: "requirements proposed" -> "requirements" (we are working on
finalizing this document, we are no longer in the proposal stage)
3., requirement 2: Is the choice of 'characters delimiting the labels'
open, is this only the ASCII dot, is this a small set (I'm interested in
this both for spec clarity and because the answer might strongly affect
3, 'possible requirement' related to directionality controls:
"(outside of the labels)" -> "(outside, but potentially directly
adjacent of the labels)" (does this include cases with directionality
controls inside a domain name, i.e. before/after a dot?)
"the conditions above require extra testing" -> "the conditions above
required extra testing"
3., 'Delimiterchars': FULL STOP not allowed in domain names?????
4.1, para 1: "This marking is obligatory, and both double vowels and
syllable-final consonants are indicated by the marking of special
unvoiced characters." -> "This marking is obligatory, and syllable-final
consonants are indicated a special unvoiced character."
(double (long) vowels are indicated in Unicode by their own combining
mark, which is of course voiced. These are graphically in most cases
just duplications of the single (short) vowels. The current text
suggests a special "duplicate the proceeding vowel" sing similar to the
one (sukun) for consonants, but such a suggestion is wrong.)
4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
4.2: This section could be shortened considerably. "Greater latitude
here than ... Dhivehi." is irrelevant; as long as a significant part of
a language's words cannot be used in IDN, there's a problem. The
subsection is interesting for people interested in Yiddish, but the
average reader of the spec will try to find something relevant for the
algorithm, and mostly be more confused than enlightened.
4.3: "(with the 5 being considered right-to-left because of the leading
ALEF)": No, the 5 itself is never right-to-left. Change to "(the overall
directionality being right-to-left because of the leading ALEF)"
4.3: "but barring them both seems to require justification" -> "but
barring them both seems unnecessary" or "but barring them both turned
out to be unnecessary"
5. "Even if a label is registered under a "safe" label,": 'under' should
be explained more clearly (I assume this refers to the hierarchical
relationship in the DNS)
5., last paragraph: It would be better to change this into a SHOULD,
such as "Where implementations see a a way to avoid ..., they SHOULD
avoid". That will bring this issue on the radar screen of implementers,
whereas it currently will just be glossed over.
6., first paragraph: "All other issues with these scripts": What scripts???
6. "wishes to create rules for the mixing of digits" -> "wishes to
create rules against the mixing of digits" or "wishes to restrict the
mixing of digits"
6. "Rules are also specified at the protocol level, but while the
example above involves right-to-left characters, this is not inherently
a BIDI problem." -> "This example is not inherently a BIDI problem, so
such restrictions are not specified at the protocol level."
("Rules are also specified at the protocol level" is inherently vague;
it seems to mean "Some rules against mixing digits are also specified at
the protocol level, but only when this is necessary to avoid a BIDI
6. "It is unrealistic to expect that applications will display domain
names using embedded formatting codes between their labels (for one
thing, no reliable algorithms for identifying domain names in running
text exist);": Please add that it is also unrealistic that formatting
codes are removed before IDNA processing, and that allowing formatting
codes could lead to many kinds of 'mischief' that would go against the
two requirements in section 3.
6. "which might surprise someone expecting to see labels displayed in
hierarchical order.": Please add that this may not be such a big problem
to general users familiar with BIDI, because they are used to
seeing/reading a sequuence of RTL units (e.g. words) from right to left.
(for wording alternatives, see
http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second
7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really
farfetched (not impossible just because there is no guarantee against
weird implementations). It would be good to indicate that somehow.
(this includes the paragraph following bullet point 3)
7.1: "The editors believe": change to something less specific; this is a
WG document, we either have rough consensus or we don't. (I for one
fully agree with this point)
7.2: This should be slightly reworded to more clearly send the message
that changes to Unicode bidi properties, while not totally impossible,
are expected to be rare, and to affect mostly symbols and the like,
which will limit their effect on what the BIDI rule(/test) allows and
8. "It is possible that differences in the interpretation of the
specification": Wrong. There are no differences in interpretation for
the old spec. There are no differences in the interpretation of the new
spec. There are differences in the specs themselves.
P.S.: Unfortunately, I will not have time to review the remaining
documents (tables, rationale) during last call (or this week).
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
More information about the Idna-update