my comments on draft-ietf-idnabis-bidi-05

Tue Sep 8 22:27:29 CEST 2009

Apologies for being a week late in responding. I'll try to respond to 
those issues that haven't already been beaten to death.

(Cary - I need your expertise for a couple of the issues. Please help!)

Martin J. Dürst wrote:
> Abstract: Should mention bidi rules first, then changes (this has been 
> fixed in the document itself, which is great).
>
> Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are 
> archival documents.
>   
Will try. Am going blind by now to such things, though....
> 1.1, para 2: "When labels satisfy the rule, and when certain other 
> conditions are satisfied, they can be used with a minimal chance of 
> these labels being displayed in a confusing way by a bidirectional 
> display algorithm.": "they" .. "these labels" is confusing. What about
> "When labels satisfy the rule, and when certain other conditions are 
> satisfied, there is only a minimal chance that these labels will be 
> displayed in a confusing way by a bidirectional display algorithm."
>   
Will do.
> 1.1: "A bidirectional display algorithm": How many of them do we have? 
> (I only know one, the Unicode one (with some minor variants)). How many 
> of them have been used for testing/verification?
>
> 1.1, para 3: what exactly is a "right-to-left character"?
>   
Type R or AL. I'll add this to definitions. AN is a bit weird.
> 1.2: This section ideally should also be moved to after Section 2.
>   
I don't agree; I prefer to have the context-setting done first, and 
splitting the section to sort out what's not necessary context-setting 
is more work than I care for.
> 1.2, para 1: "The IDNA specification "Stringprep"": change to something 
> like "Stringprep, part of IDNA2003". Otherwise, it's not clear that this 
> is an old spec.
>   
Will do. Will also change the entire 2003 description to past tense.
> 1.2, para 4: "However, this makes certain words" -> "However, this made 
> certain words" (past tense)
>
> 1.2, para 7: "While the document specifies rules" -> "While this 
> document specifies rules"
>   
Will do.
> 1.2, para 7: "(the most important being label that mix Arabic and 
> European digits (AN and EN) inside an RTL label, and labels that use AN 
> in an LTR label)": Very weird. Such cases may not be completely 
> impossible, but they are much less frequent than e.g. Arabic numbers 
> inside Arabic letters, European numbers inside Arabic letters, and so 
> on. There was even a strong movement to prohibit number mixing at the 
> protocol level; this would never have happened if such mixing would have 
> been deemed to be "most important". Also, after looking at the actual 
> conditions, we either have an RTL label, which by condition 4 excludes 
> mixing EN and AN, or we have an LTR label, which by condition 5 excludes 
> AN and therefore the mixture of EN and AN.
>   
The commentary on version 4 asked for specific examples of strings that 
were allowed under IDNA2003's BIDI rule, but disallowed under this 
specification. This is it.

Is it possible to make this clearer?
> 1.3, title: "Layout" -> "Structure" or "Organization"
>   
I prefer "layout", but "structure" is also OK with me. Will change.
> 1.3, para 1: Change from "bidi test" to "bidi rule". (or unify otherwise)
>
> 1.3, para 1: ", that" -> ", which"
>   
when to use "which" and when to use "that" seems to be a bone of 
contention among linguistically-competent people.
> 1.3, para 1: "no matter what the direction of the label is": What does 
> this mean? It could either mean that you can apply the test forwards or 
> backwards, or it could mean that it doesn't depend on what 
> directionality the characters in the label have, or whatever. In the 
> later case, I'd write e.g.: "This test [->rule, see above and below] can 
> be applied to any kind of label, but becomes trivial if the input is 
> guaranteed to contain only LTR characters."
>   
It means that you can apply the test to both RTL and LTR labels.
It's not trivial for LTR labels either.
> 1.3: "The primary initial use of that test": "that test" -> "this test" 
> (this sentence talks about relationship with other documents, so it's 
> the test in this document, not the test in that other section)
>   
Will do.
> 1.3, para 2: "a BIDI rule" -> "the BIDI rule"
>
> 1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are 
> talking about document organization, so it's "the rule in that other 
> section over there", so "here" doesn't fit)
>
> 1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to Section 
> 7 describe": Section 8 is IANA consideration.
>   
Will fix.
> 1.4: I have no problem following this stuff because I have worked on 
> bidi earlier, but somebody who's not familiar with BIDI will encur a 
> very steep learning courve. Either help a bit more with e.g. sentences 
> such as "for the purposes of bidirectional layout, each Unicode 
> character is assingned a BIDI property value."
>   
....or? I'd like to resist adding more tutorial material here. This 
document will remain incomprehensible until one reads the Unicode BIDI 
specification.
> 1.4: "non spacing" -> "nonspacing"
>
> 1.4, "The directionality of such examples" -> "The display order of such 
> examples"
>
> 1.4, "it means ..., approximately" -> "it approximately means"
>   
I like the other order; YMMV.
> 1.4 "An RTL label": This seems to be the definition that Protocol might 
> want to refer to.
>   
Yes.
> 1.4 'Having a separate category of "RTL domain names" would not make 
> this specification simpler, so has not been done.' -> 'Providing a 
> separate category of "RTL domain names" would not make this 
> specification simpler.'
>
> Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are 
> used, that's confusing. The term is always in singular. The document 
> works that way in general, but "The following test" at the start of 
> Section 2 is confusing, because the only 'tests' that one can see are 
> the ones labeled 1. to 6. Maybe use something like "In order to pass the 
> BIDI test, the following conditions 1. to 6. must all be satisfied."
>   
I thought I'd already added that.... will do.
> 2, conditions 2/4: Why are BN (control characters) allowed in RTL but 
> not in LTR?
>   
Error. See other thread.
> 3. "A requirement" -> "The requirement" (see above)
>
> 3., para 2: As this restricts things to the Unicode bidi algorithm, 
> please say this earlier. (see above)
>
> 3., para 3: "requirements proposed" -> "requirements" (we are working on 
> finalizing this document, we are no longer in the proposal stage)
>
> 3., requirement 2: Is the choice of 'characters delimiting the labels' 
> open, is this only the ASCII dot, is this a small set (I'm interested in 
> this both for spec clarity and because the answer might strongly affect 
> draft-duerst-iri-bis).
The formalistic part says that "delimiterchars" are of class CS, WS and ON.
For IRI: Note the comment that says that the percent sign breaks things.
> 3, 'possible requirement' related to directionality controls:
> "(outside of the labels)" -> "(outside, but potentially directly 
> adjacent of the labels)" (does this include cases with directionality 
> controls inside a domain name, i.e. before/after a dot?)
> "the conditions above require extra testing" -> "the conditions above 
> required extra testing"
>   
It's intended to mean "not between the labels". Will clarify.
> 3., 'Delimiterchars': FULL STOP not allowed in domain names?????
>   
Should be "labels". Will fix.
> 4.1, para 1: "This marking is obligatory, and both double vowels and 
> syllable-final consonants are indicated by the marking of special 
> unvoiced characters." -> "This marking is obligatory, and syllable-final 
> consonants are indicated a special unvoiced character."
> (double (long) vowels are indicated in Unicode by their own combining 
> mark, which is of course voiced. These are graphically in most cases 
> just duplications of the single (short) vowels. The current text 
> suggests a special "duplicate the proceeding vowel" sing similar to the 
> one (sukun) for consonants, but such a suggestion is wrong.)
>   
I'll leave this to Cary....
> 4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
>   
but this one I can fix....
> 4.2: This section could be shortened considerably. "Greater latitude 
> here than ... Dhivehi." is irrelevant; as long as a significant part of 
> a language's words cannot be used in IDN, there's a problem. The 
> subsection is interesting for people interested in Yiddish, but the 
> average reader of the spec will try to find something relevant for the 
> algorithm, and mostly be more confused than enlightened.
>
> 4.3: "(with the 5 being considered right-to-left because of the leading 
> ALEF)": No, the 5 itself is never right-to-left. Change to "(the overall 
> directionality being right-to-left because of the leading ALEF)"
>
> 4.3: "but barring them both seems to require justification" -> "but 
> barring them both seems unnecessary" or "but barring them both turned 
> out to be unnecessary"
>   
I'll let Cary handle this one too. He's the Hebrew expert.
> 5. "Even if a label is registered under a "safe" label,": 'under' should 
> be explained more clearly (I assume this refers to the hierarchical 
> relationship in the DNS)
>   
It does. I assume that readers have a passing acquaintance with the DNS; 
here too, I don't want to descend into excessive hand-holding. Too much 
risk of getting it wrong.
> 5., last paragraph: It would be better to change this into a SHOULD, 
> such as "Where implementations see a a way to avoid ..., they SHOULD 
> avoid". That will bring this issue on the radar screen of implementers, 
> whereas it currently will just be glossed over.
>   
This document presently does not use (or need) 2119 language.
I removed all normative statements about what people should or should 
not do in this situation after the discussion in Dublin; unless the 
Chair declares consensus otherwise, I will keep it that way.
> 6., first paragraph: "All other issues with these scripts": What scripts???
>   
Right-to-left scripts.
> 6. "wishes to create rules for the mixing of digits" -> "wishes to 
> create rules against the mixing of digits" or "wishes to restrict the 
> mixing of digits"
>   
better. Will do.
> 6. "Rules are also specified at the protocol level, but while the 
> example above involves right-to-left characters, this is not inherently 
> a BIDI problem." -> "This example is not inherently a BIDI problem, so 
> such restrictions are not specified at the protocol level."
> ("Rules are also specified at the protocol level" is inherently vague; 
> it seems to mean "Some rules against mixing digits are also specified at 
> the protocol level, but only when this is necessary to avoid a BIDI 
> problem.")
>   
Better. Will do.
> 6. "It is unrealistic to expect that applications will display domain 
> names using embedded formatting codes between their labels (for one 
> thing, no reliable algorithms for identifying domain names in running 
> text exist);": Please add that it is also unrealistic that formatting 
> codes are removed before IDNA processing, and that allowing formatting 
> codes could lead to many kinds of 'mischief' that would go against the 
> two requirements in section 3.
>   
It's not an issue in need of resolution, so I'll skip that.
> 6. "which might surprise someone expecting to see labels displayed in 
> hierarchical order.": Please add that this may not be such a big problem 
> to general users familiar with BIDI, because they are used to 
> seeing/reading a sequuence of RTL units (e.g. words) from right to left.
> (for wording alternatives, see 
> http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second 
> para*, ...)
>   
Cary mentioned that registrations under .museum show that this is not so 
clear-cut...
> 7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really 
> farfetched (not impossible just because there is no guarantee against 
> weird implementations). It would be good to indicate that somehow.
> (this includes the paragraph following bullet point 3)
>   
This came out of an exchange with Paul Hoffman, I believe. I think it's 
far-fetched too, but it is the one case that people came up with where 
the relaxation of what characters are allowed at the end of a label has 
any specific effect.
> 7.1: "The editors believe": change to something less specific; this is a 
> WG document, we either have rough consensus or we don't. (I for one 
> fully agree with this point)
>   
"The WG believes"? Haven't had much commentary on this section.  I 
really hate "It is believed", but if authorized to speak for the WG, 
I'll do so. It is a value judgment, so I believe that saying "These 
cases will... " as a statement of fact is inappropriate.
> 7.2: This should be slightly reworded to more clearly send the message 
> that changes to Unicode bidi properties, while not totally impossible, 
> are expected to be rare, and to affect mostly symbols and the like, 
> which will limit their effect on what the BIDI rule(/test) allows and 
> what not.
>   
That is something I can't say anything about. Please send text.
> 8. "It is possible that differences in the interpretation of the 
> specification": Wrong. There are no differences in interpretation for 
> the old spec. There are no differences in the interpretation of the new 
> spec. There are differences in the specs themselves.
>   
I'll change to "differences in interpretation of labels between 
implementations of IDNA2003 and IDNA2008", since that was what I think I 
intended to say.
> Regards,   Martin.
>
> P.S.: Unfortunately, I will not have time to review the remaining 
> documents (tables, rationale) during last call (or this week).
>
>