my comments on draft-ietf-idnabis-bidi-05

Tue Sep 1 16:51:35 CEST 2009

thanks for these recommendations Martin.

Harald, please review Martin's suggestions for a final version of BiDi  
document.

vint

On Sep 1, 2009, at 4:20 AM, Martin J. Dürst wrote:

> Abstract: Should mention bidi rules first, then changes (this has been
> fixed in the document itself, which is great).
>
> Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are
> archival documents.
>
> 1.1, para 2: "When labels satisfy the rule, and when certain other
> conditions are satisfied, they can be used with a minimal chance of
> these labels being displayed in a confusing way by a bidirectional
> display algorithm.": "they" .. "these labels" is confusing. What about
> "When labels satisfy the rule, and when certain other conditions are
> satisfied, there is only a minimal chance that these labels will be
> displayed in a confusing way by a bidirectional display algorithm."
>
> 1.1: "A bidirectional display algorithm": How many of them do we have?
> (I only know one, the Unicode one (with some minor variants)). How  
> many
> of them have been used for testing/verification?
>
> 1.1, para 3: what exactly is a "right-to-left character"?
>
> 1.2: This section ideally should also be moved to after Section 2.
>
> 1.2, para 1: "The IDNA specification "Stringprep"": change to  
> something
> like "Stringprep, part of IDNA2003". Otherwise, it's not clear that  
> this
> is an old spec.
>
> 1.2, para 4: "However, this makes certain words" -> "However, this  
> made
> certain words" (past tense)
>
> 1.2, para 7: "While the document specifies rules" -> "While this
> document specifies rules"
>
> 1.2, para 7: "(the most important being label that mix Arabic and
> European digits (AN and EN) inside an RTL label, and labels that use  
> AN
> in an LTR label)": Very weird. Such cases may not be completely
> impossible, but they are much less frequent than e.g. Arabic numbers
> inside Arabic letters, European numbers inside Arabic letters, and so
> on. There was even a strong movement to prohibit number mixing at the
> protocol level; this would never have happened if such mixing would  
> have
> been deemed to be "most important". Also, after looking at the actual
> conditions, we either have an RTL label, which by condition 4 excludes
> mixing EN and AN, or we have an LTR label, which by condition 5  
> excludes
> AN and therefore the mixture of EN and AN.
>
> 1.3, title: "Layout" -> "Structure" or "Organization"
>
> 1.3, para 1: Change from "bidi test" to "bidi rule". (or unify  
> otherwise)
>
> 1.3, para 1: ", that" -> ", which"
>
> 1.3, para 1: "no matter what the direction of the label is": What does
> this mean? It could either mean that you can apply the test forwards  
> or
> backwards, or it could mean that it doesn't depend on what
> directionality the characters in the label have, or whatever. In the
> later case, I'd write e.g.: "This test [->rule, see above and below]  
> can
> be applied to any kind of label, but becomes trivial if the input is
> guaranteed to contain only LTR characters."
>
> 1.3: "The primary initial use of that test": "that test" -> "this  
> test"
> (this sentence talks about relationship with other documents, so it's
> the test in this document, not the test in that other section)
>
> 1.3, para 2: "a BIDI rule" -> "the BIDI rule"
>
> 1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are
> talking about document organization, so it's "the rule in that other
> section over there", so "here" doesn't fit)
>
> 1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to  
> Section
> 7 describe": Section 8 is IANA consideration.
>
> 1.4: I have no problem following this stuff because I have worked on
> bidi earlier, but somebody who's not familiar with BIDI will encur a
> very steep learning courve. Either help a bit more with e.g. sentences
> such as "for the purposes of bidirectional layout, each Unicode
> character is assingned a BIDI property value."
>
> 1.4: "non spacing" -> "nonspacing"
>
> 1.4, "The directionality of such examples" -> "The display order of  
> such
> examples"
>
> 1.4, "it means ..., approximately" -> "it approximately means"
>
> 1.4 "An RTL label": This seems to be the definition that Protocol  
> might
> want to refer to.
>
> 1.4 'Having a separate category of "RTL domain names" would not make
> this specification simpler, so has not been done.' -> 'Providing a
> separate category of "RTL domain names" would not make this
> specification simpler.'
>
> Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are
> used, that's confusing. The term is always in singular. The document
> works that way in general, but "The following test" at the start of
> Section 2 is confusing, because the only 'tests' that one can see are
> the ones labeled 1. to 6. Maybe use something like "In order to pass  
> the
> BIDI test, the following conditions 1. to 6. must all be satisfied."
>
> 2, conditions 2/4: Why are BN (control characters) allowed in RTL but
> not in LTR?
>
> 3. "A requirement" -> "The requirement" (see above)
>
> 3., para 2: As this restricts things to the Unicode bidi algorithm,
> please say this earlier. (see above)
>
> 3., para 3: "requirements proposed" -> "requirements" (we are  
> working on
> finalizing this document, we are no longer in the proposal stage)
>
> 3., requirement 2: Is the choice of 'characters delimiting the labels'
> open, is this only the ASCII dot, is this a small set (I'm  
> interested in
> this both for spec clarity and because the answer might strongly  
> affect
> draft-duerst-iri-bis).
>
> 3, 'possible requirement' related to directionality controls:
> "(outside of the labels)" -> "(outside, but potentially directly
> adjacent of the labels)" (does this include cases with directionality
> controls inside a domain name, i.e. before/after a dot?)
> "the conditions above require extra testing" -> "the conditions above
> required extra testing"
>
> 3., 'Delimiterchars': FULL STOP not allowed in domain names?????
>
> 4.1, para 1: "This marking is obligatory, and both double vowels and
> syllable-final consonants are indicated by the marking of special
> unvoiced characters." -> "This marking is obligatory, and syllable- 
> final
> consonants are indicated a special unvoiced character."
> (double (long) vowels are indicated in Unicode by their own combining
> mark, which is of course voiced. These are graphically in most cases
> just duplications of the single (short) vowels. The current text
> suggests a special "duplicate the proceeding vowel" sing similar to  
> the
> one (sukun) for consonants, but such a suggestion is wrong.)
>
> 4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
>
> 4.2: This section could be shortened considerably. "Greater latitude
> here than ... Dhivehi." is irrelevant; as long as a significant part  
> of
> a language's words cannot be used in IDN, there's a problem. The
> subsection is interesting for people interested in Yiddish, but the
> average reader of the spec will try to find something relevant for the
> algorithm, and mostly be more confused than enlightened.
>
> 4.3: "(with the 5 being considered right-to-left because of the  
> leading
> ALEF)": No, the 5 itself is never right-to-left. Change to "(the  
> overall
> directionality being right-to-left because of the leading ALEF)"
>
> 4.3: "but barring them both seems to require justification" -> "but
> barring them both seems unnecessary" or "but barring them both turned
> out to be unnecessary"
>
> 5. "Even if a label is registered under a "safe" label,": 'under'  
> should
> be explained more clearly (I assume this refers to the hierarchical
> relationship in the DNS)
>
> 5., last paragraph: It would be better to change this into a SHOULD,
> such as "Where implementations see a a way to avoid ..., they SHOULD
> avoid". That will bring this issue on the radar screen of  
> implementers,
> whereas it currently will just be glossed over.
>
> 6., first paragraph: "All other issues with these scripts": What  
> scripts???
>
> 6. "wishes to create rules for the mixing of digits" -> "wishes to
> create rules against the mixing of digits" or "wishes to restrict the
> mixing of digits"
>
> 6. "Rules are also specified at the protocol level, but while the
> example above involves right-to-left characters, this is not  
> inherently
> a BIDI problem." -> "This example is not inherently a BIDI problem, so
> such restrictions are not specified at the protocol level."
> ("Rules are also specified at the protocol level" is inherently vague;
> it seems to mean "Some rules against mixing digits are also  
> specified at
> the protocol level, but only when this is necessary to avoid a BIDI
> problem.")
>
> 6. "It is unrealistic to expect that applications will display domain
> names using embedded formatting codes between their labels (for one
> thing, no reliable algorithms for identifying domain names in running
> text exist);": Please add that it is also unrealistic that formatting
> codes are removed before IDNA processing, and that allowing formatting
> codes could lead to many kinds of 'mischief' that would go against the
> two requirements in section 3.
>
> 6. "which might surprise someone expecting to see labels displayed in
> hierarchical order.": Please add that this may not be such a big  
> problem
> to general users familiar with BIDI, because they are used to
> seeing/reading a sequuence of RTL units (e.g. words) from right to  
> left.
> (for wording alternatives, see
> http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second
> para*, ...)
>
> 7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really
> farfetched (not impossible just because there is no guarantee against
> weird implementations). It would be good to indicate that somehow.
> (this includes the paragraph following bullet point 3)
>
> 7.1: "The editors believe": change to something less specific; this  
> is a
> WG document, we either have rough consensus or we don't. (I for one
> fully agree with this point)
>
> 7.2: This should be slightly reworded to more clearly send the message
> that changes to Unicode bidi properties, while not totally impossible,
> are expected to be rare, and to affect mostly symbols and the like,
> which will limit their effect on what the BIDI rule(/test) allows and
> what not.
>
> 8. "It is possible that differences in the interpretation of the
> specification": Wrong. There are no differences in interpretation for
> the old spec. There are no differences in the interpretation of the  
> new
> spec. There are differences in the specs themselves.
>
> Regards,   Martin.
>
> P.S.: Unfortunately, I will not have time to review the remaining
> documents (tables, rationale) during last call (or this week).
>
> -- 
> #-# Martin J. Dürst, Professor, Aoyama Gakuin University
> #-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update