my comments on draft-ietf-idnabis-bidi-05
vint at google.com
Tue Sep 1 16:51:35 CEST 2009
thanks for these recommendations Martin.
Harald, please review Martin's suggestions for a final version of BiDi
On Sep 1, 2009, at 4:20 AM, Martin J. Dürst wrote:
> Abstract: Should mention bidi rules first, then changes (this has been
> fixed in the document itself, which is great).
> Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are
> archival documents.
> 1.1, para 2: "When labels satisfy the rule, and when certain other
> conditions are satisfied, they can be used with a minimal chance of
> these labels being displayed in a confusing way by a bidirectional
> display algorithm.": "they" .. "these labels" is confusing. What about
> "When labels satisfy the rule, and when certain other conditions are
> satisfied, there is only a minimal chance that these labels will be
> displayed in a confusing way by a bidirectional display algorithm."
> 1.1: "A bidirectional display algorithm": How many of them do we have?
> (I only know one, the Unicode one (with some minor variants)). How
> of them have been used for testing/verification?
> 1.1, para 3: what exactly is a "right-to-left character"?
> 1.2: This section ideally should also be moved to after Section 2.
> 1.2, para 1: "The IDNA specification "Stringprep"": change to
> like "Stringprep, part of IDNA2003". Otherwise, it's not clear that
> is an old spec.
> 1.2, para 4: "However, this makes certain words" -> "However, this
> certain words" (past tense)
> 1.2, para 7: "While the document specifies rules" -> "While this
> document specifies rules"
> 1.2, para 7: "(the most important being label that mix Arabic and
> European digits (AN and EN) inside an RTL label, and labels that use
> in an LTR label)": Very weird. Such cases may not be completely
> impossible, but they are much less frequent than e.g. Arabic numbers
> inside Arabic letters, European numbers inside Arabic letters, and so
> on. There was even a strong movement to prohibit number mixing at the
> protocol level; this would never have happened if such mixing would
> been deemed to be "most important". Also, after looking at the actual
> conditions, we either have an RTL label, which by condition 4 excludes
> mixing EN and AN, or we have an LTR label, which by condition 5
> AN and therefore the mixture of EN and AN.
> 1.3, title: "Layout" -> "Structure" or "Organization"
> 1.3, para 1: Change from "bidi test" to "bidi rule". (or unify
> 1.3, para 1: ", that" -> ", which"
> 1.3, para 1: "no matter what the direction of the label is": What does
> this mean? It could either mean that you can apply the test forwards
> backwards, or it could mean that it doesn't depend on what
> directionality the characters in the label have, or whatever. In the
> later case, I'd write e.g.: "This test [->rule, see above and below]
> be applied to any kind of label, but becomes trivial if the input is
> guaranteed to contain only LTR characters."
> 1.3: "The primary initial use of that test": "that test" -> "this
> (this sentence talks about relationship with other documents, so it's
> the test in this document, not the test in that other section)
> 1.3, para 2: "a BIDI rule" -> "the BIDI rule"
> 1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are
> talking about document organization, so it's "the rule in that other
> section over there", so "here" doesn't fit)
> 1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to
> 7 describe": Section 8 is IANA consideration.
> 1.4: I have no problem following this stuff because I have worked on
> bidi earlier, but somebody who's not familiar with BIDI will encur a
> very steep learning courve. Either help a bit more with e.g. sentences
> such as "for the purposes of bidirectional layout, each Unicode
> character is assingned a BIDI property value."
> 1.4: "non spacing" -> "nonspacing"
> 1.4, "The directionality of such examples" -> "The display order of
> 1.4, "it means ..., approximately" -> "it approximately means"
> 1.4 "An RTL label": This seems to be the definition that Protocol
> want to refer to.
> 1.4 'Having a separate category of "RTL domain names" would not make
> this specification simpler, so has not been done.' -> 'Providing a
> separate category of "RTL domain names" would not make this
> specification simpler.'
> Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are
> used, that's confusing. The term is always in singular. The document
> works that way in general, but "The following test" at the start of
> Section 2 is confusing, because the only 'tests' that one can see are
> the ones labeled 1. to 6. Maybe use something like "In order to pass
> BIDI test, the following conditions 1. to 6. must all be satisfied."
> 2, conditions 2/4: Why are BN (control characters) allowed in RTL but
> not in LTR?
> 3. "A requirement" -> "The requirement" (see above)
> 3., para 2: As this restricts things to the Unicode bidi algorithm,
> please say this earlier. (see above)
> 3., para 3: "requirements proposed" -> "requirements" (we are
> working on
> finalizing this document, we are no longer in the proposal stage)
> 3., requirement 2: Is the choice of 'characters delimiting the labels'
> open, is this only the ASCII dot, is this a small set (I'm
> interested in
> this both for spec clarity and because the answer might strongly
> 3, 'possible requirement' related to directionality controls:
> "(outside of the labels)" -> "(outside, but potentially directly
> adjacent of the labels)" (does this include cases with directionality
> controls inside a domain name, i.e. before/after a dot?)
> "the conditions above require extra testing" -> "the conditions above
> required extra testing"
> 3., 'Delimiterchars': FULL STOP not allowed in domain names?????
> 4.1, para 1: "This marking is obligatory, and both double vowels and
> syllable-final consonants are indicated by the marking of special
> unvoiced characters." -> "This marking is obligatory, and syllable-
> consonants are indicated a special unvoiced character."
> (double (long) vowels are indicated in Unicode by their own combining
> mark, which is of course voiced. These are graphically in most cases
> just duplications of the single (short) vowels. The current text
> suggests a special "duplicate the proceeding vowel" sing similar to
> one (sukun) for consonants, but such a suggestion is wrong.)
> 4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
> 4.2: This section could be shortened considerably. "Greater latitude
> here than ... Dhivehi." is irrelevant; as long as a significant part
> a language's words cannot be used in IDN, there's a problem. The
> subsection is interesting for people interested in Yiddish, but the
> average reader of the spec will try to find something relevant for the
> algorithm, and mostly be more confused than enlightened.
> 4.3: "(with the 5 being considered right-to-left because of the
> ALEF)": No, the 5 itself is never right-to-left. Change to "(the
> directionality being right-to-left because of the leading ALEF)"
> 4.3: "but barring them both seems to require justification" -> "but
> barring them both seems unnecessary" or "but barring them both turned
> out to be unnecessary"
> 5. "Even if a label is registered under a "safe" label,": 'under'
> be explained more clearly (I assume this refers to the hierarchical
> relationship in the DNS)
> 5., last paragraph: It would be better to change this into a SHOULD,
> such as "Where implementations see a a way to avoid ..., they SHOULD
> avoid". That will bring this issue on the radar screen of
> whereas it currently will just be glossed over.
> 6., first paragraph: "All other issues with these scripts": What
> 6. "wishes to create rules for the mixing of digits" -> "wishes to
> create rules against the mixing of digits" or "wishes to restrict the
> mixing of digits"
> 6. "Rules are also specified at the protocol level, but while the
> example above involves right-to-left characters, this is not
> a BIDI problem." -> "This example is not inherently a BIDI problem, so
> such restrictions are not specified at the protocol level."
> ("Rules are also specified at the protocol level" is inherently vague;
> it seems to mean "Some rules against mixing digits are also
> specified at
> the protocol level, but only when this is necessary to avoid a BIDI
> 6. "It is unrealistic to expect that applications will display domain
> names using embedded formatting codes between their labels (for one
> thing, no reliable algorithms for identifying domain names in running
> text exist);": Please add that it is also unrealistic that formatting
> codes are removed before IDNA processing, and that allowing formatting
> codes could lead to many kinds of 'mischief' that would go against the
> two requirements in section 3.
> 6. "which might surprise someone expecting to see labels displayed in
> hierarchical order.": Please add that this may not be such a big
> to general users familiar with BIDI, because they are used to
> seeing/reading a sequuence of RTL units (e.g. words) from right to
> (for wording alternatives, see
> http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second
> para*, ...)
> 7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really
> farfetched (not impossible just because there is no guarantee against
> weird implementations). It would be good to indicate that somehow.
> (this includes the paragraph following bullet point 3)
> 7.1: "The editors believe": change to something less specific; this
> is a
> WG document, we either have rough consensus or we don't. (I for one
> fully agree with this point)
> 7.2: This should be slightly reworded to more clearly send the message
> that changes to Unicode bidi properties, while not totally impossible,
> are expected to be rare, and to affect mostly symbols and the like,
> which will limit their effect on what the BIDI rule(/test) allows and
> what not.
> 8. "It is possible that differences in the interpretation of the
> specification": Wrong. There are no differences in interpretation for
> the old spec. There are no differences in the interpretation of the
> spec. There are differences in the specs themselves.
> Regards, Martin.
> P.S.: Unfortunately, I will not have time to review the remaining
> documents (tables, rationale) during last call (or this week).
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp mailto:duerst at it.aoyama.ac.jp
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update