my comments on draft-ietf-idnabis-bidi-05
Harald Alvestrand
harald at alvestrand.no
Tue Sep 8 22:27:29 CEST 2009
Apologies for being a week late in responding. I'll try to respond to
those issues that haven't already been beaten to death.
(Cary - I need your expertise for a couple of the issues. Please help!)
Martin J. Dürst wrote:
> Abstract: Should mention bidi rules first, then changes (this has been
> fixed in the document itself, which is great).
>
> Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are
> archival documents.
>
Will try. Am going blind by now to such things, though....
> 1.1, para 2: "When labels satisfy the rule, and when certain other
> conditions are satisfied, they can be used with a minimal chance of
> these labels being displayed in a confusing way by a bidirectional
> display algorithm.": "they" .. "these labels" is confusing. What about
> "When labels satisfy the rule, and when certain other conditions are
> satisfied, there is only a minimal chance that these labels will be
> displayed in a confusing way by a bidirectional display algorithm."
>
Will do.
> 1.1: "A bidirectional display algorithm": How many of them do we have?
> (I only know one, the Unicode one (with some minor variants)). How many
> of them have been used for testing/verification?
>
> 1.1, para 3: what exactly is a "right-to-left character"?
>
Type R or AL. I'll add this to definitions. AN is a bit weird.
> 1.2: This section ideally should also be moved to after Section 2.
>
I don't agree; I prefer to have the context-setting done first, and
splitting the section to sort out what's not necessary context-setting
is more work than I care for.
> 1.2, para 1: "The IDNA specification "Stringprep"": change to something
> like "Stringprep, part of IDNA2003". Otherwise, it's not clear that this
> is an old spec.
>
Will do. Will also change the entire 2003 description to past tense.
> 1.2, para 4: "However, this makes certain words" -> "However, this made
> certain words" (past tense)
>
> 1.2, para 7: "While the document specifies rules" -> "While this
> document specifies rules"
>
Will do.
> 1.2, para 7: "(the most important being label that mix Arabic and
> European digits (AN and EN) inside an RTL label, and labels that use AN
> in an LTR label)": Very weird. Such cases may not be completely
> impossible, but they are much less frequent than e.g. Arabic numbers
> inside Arabic letters, European numbers inside Arabic letters, and so
> on. There was even a strong movement to prohibit number mixing at the
> protocol level; this would never have happened if such mixing would have
> been deemed to be "most important". Also, after looking at the actual
> conditions, we either have an RTL label, which by condition 4 excludes
> mixing EN and AN, or we have an LTR label, which by condition 5 excludes
> AN and therefore the mixture of EN and AN.
>
The commentary on version 4 asked for specific examples of strings that
were allowed under IDNA2003's BIDI rule, but disallowed under this
specification. This is it.
Is it possible to make this clearer?
> 1.3, title: "Layout" -> "Structure" or "Organization"
>
I prefer "layout", but "structure" is also OK with me. Will change.
> 1.3, para 1: Change from "bidi test" to "bidi rule". (or unify otherwise)
>
> 1.3, para 1: ", that" -> ", which"
>
when to use "which" and when to use "that" seems to be a bone of
contention among linguistically-competent people.
> 1.3, para 1: "no matter what the direction of the label is": What does
> this mean? It could either mean that you can apply the test forwards or
> backwards, or it could mean that it doesn't depend on what
> directionality the characters in the label have, or whatever. In the
> later case, I'd write e.g.: "This test [->rule, see above and below] can
> be applied to any kind of label, but becomes trivial if the input is
> guaranteed to contain only LTR characters."
>
It means that you can apply the test to both RTL and LTR labels.
It's not trivial for LTR labels either.
> 1.3: "The primary initial use of that test": "that test" -> "this test"
> (this sentence talks about relationship with other documents, so it's
> the test in this document, not the test in that other section)
>
Will do.
> 1.3, para 2: "a BIDI rule" -> "the BIDI rule"
>
> 1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are
> talking about document organization, so it's "the rule in that other
> section over there", so "here" doesn't fit)
>
> 1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to Section
> 7 describe": Section 8 is IANA consideration.
>
Will fix.
> 1.4: I have no problem following this stuff because I have worked on
> bidi earlier, but somebody who's not familiar with BIDI will encur a
> very steep learning courve. Either help a bit more with e.g. sentences
> such as "for the purposes of bidirectional layout, each Unicode
> character is assingned a BIDI property value."
>
....or? I'd like to resist adding more tutorial material here. This
document will remain incomprehensible until one reads the Unicode BIDI
specification.
> 1.4: "non spacing" -> "nonspacing"
>
> 1.4, "The directionality of such examples" -> "The display order of such
> examples"
>
> 1.4, "it means ..., approximately" -> "it approximately means"
>
I like the other order; YMMV.
> 1.4 "An RTL label": This seems to be the definition that Protocol might
> want to refer to.
>
Yes.
> 1.4 'Having a separate category of "RTL domain names" would not make
> this specification simpler, so has not been done.' -> 'Providing a
> separate category of "RTL domain names" would not make this
> specification simpler.'
>
> Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are
> used, that's confusing. The term is always in singular. The document
> works that way in general, but "The following test" at the start of
> Section 2 is confusing, because the only 'tests' that one can see are
> the ones labeled 1. to 6. Maybe use something like "In order to pass the
> BIDI test, the following conditions 1. to 6. must all be satisfied."
>
I thought I'd already added that.... will do.
> 2, conditions 2/4: Why are BN (control characters) allowed in RTL but
> not in LTR?
>
Error. See other thread.
> 3. "A requirement" -> "The requirement" (see above)
>
> 3., para 2: As this restricts things to the Unicode bidi algorithm,
> please say this earlier. (see above)
>
> 3., para 3: "requirements proposed" -> "requirements" (we are working on
> finalizing this document, we are no longer in the proposal stage)
>
> 3., requirement 2: Is the choice of 'characters delimiting the labels'
> open, is this only the ASCII dot, is this a small set (I'm interested in
> this both for spec clarity and because the answer might strongly affect
> draft-duerst-iri-bis).
The formalistic part says that "delimiterchars" are of class CS, WS and ON.
For IRI: Note the comment that says that the percent sign breaks things.
> 3, 'possible requirement' related to directionality controls:
> "(outside of the labels)" -> "(outside, but potentially directly
> adjacent of the labels)" (does this include cases with directionality
> controls inside a domain name, i.e. before/after a dot?)
> "the conditions above require extra testing" -> "the conditions above
> required extra testing"
>
It's intended to mean "not between the labels". Will clarify.
> 3., 'Delimiterchars': FULL STOP not allowed in domain names?????
>
Should be "labels". Will fix.
> 4.1, para 1: "This marking is obligatory, and both double vowels and
> syllable-final consonants are indicated by the marking of special
> unvoiced characters." -> "This marking is obligatory, and syllable-final
> consonants are indicated a special unvoiced character."
> (double (long) vowels are indicated in Unicode by their own combining
> mark, which is of course voiced. These are graphically in most cases
> just duplications of the single (short) vowels. The current text
> suggests a special "duplicate the proceeding vowel" sing similar to the
> one (sukun) for consonants, but such a suggestion is wrong.)
>
I'll leave this to Cary....
> 4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
>
but this one I can fix....
> 4.2: This section could be shortened considerably. "Greater latitude
> here than ... Dhivehi." is irrelevant; as long as a significant part of
> a language's words cannot be used in IDN, there's a problem. The
> subsection is interesting for people interested in Yiddish, but the
> average reader of the spec will try to find something relevant for the
> algorithm, and mostly be more confused than enlightened.
>
> 4.3: "(with the 5 being considered right-to-left because of the leading
> ALEF)": No, the 5 itself is never right-to-left. Change to "(the overall
> directionality being right-to-left because of the leading ALEF)"
>
> 4.3: "but barring them both seems to require justification" -> "but
> barring them both seems unnecessary" or "but barring them both turned
> out to be unnecessary"
>
I'll let Cary handle this one too. He's the Hebrew expert.
> 5. "Even if a label is registered under a "safe" label,": 'under' should
> be explained more clearly (I assume this refers to the hierarchical
> relationship in the DNS)
>
It does. I assume that readers have a passing acquaintance with the DNS;
here too, I don't want to descend into excessive hand-holding. Too much
risk of getting it wrong.
> 5., last paragraph: It would be better to change this into a SHOULD,
> such as "Where implementations see a a way to avoid ..., they SHOULD
> avoid". That will bring this issue on the radar screen of implementers,
> whereas it currently will just be glossed over.
>
This document presently does not use (or need) 2119 language.
I removed all normative statements about what people should or should
not do in this situation after the discussion in Dublin; unless the
Chair declares consensus otherwise, I will keep it that way.
> 6., first paragraph: "All other issues with these scripts": What scripts???
>
Right-to-left scripts.
> 6. "wishes to create rules for the mixing of digits" -> "wishes to
> create rules against the mixing of digits" or "wishes to restrict the
> mixing of digits"
>
better. Will do.
> 6. "Rules are also specified at the protocol level, but while the
> example above involves right-to-left characters, this is not inherently
> a BIDI problem." -> "This example is not inherently a BIDI problem, so
> such restrictions are not specified at the protocol level."
> ("Rules are also specified at the protocol level" is inherently vague;
> it seems to mean "Some rules against mixing digits are also specified at
> the protocol level, but only when this is necessary to avoid a BIDI
> problem.")
>
Better. Will do.
> 6. "It is unrealistic to expect that applications will display domain
> names using embedded formatting codes between their labels (for one
> thing, no reliable algorithms for identifying domain names in running
> text exist);": Please add that it is also unrealistic that formatting
> codes are removed before IDNA processing, and that allowing formatting
> codes could lead to many kinds of 'mischief' that would go against the
> two requirements in section 3.
>
It's not an issue in need of resolution, so I'll skip that.
> 6. "which might surprise someone expecting to see labels displayed in
> hierarchical order.": Please add that this may not be such a big problem
> to general users familiar with BIDI, because they are used to
> seeing/reading a sequuence of RTL units (e.g. words) from right to left.
> (for wording alternatives, see
> http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second
> para*, ...)
>
Cary mentioned that registrations under .museum show that this is not so
clear-cut...
> 7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really
> farfetched (not impossible just because there is no guarantee against
> weird implementations). It would be good to indicate that somehow.
> (this includes the paragraph following bullet point 3)
>
This came out of an exchange with Paul Hoffman, I believe. I think it's
far-fetched too, but it is the one case that people came up with where
the relaxation of what characters are allowed at the end of a label has
any specific effect.
> 7.1: "The editors believe": change to something less specific; this is a
> WG document, we either have rough consensus or we don't. (I for one
> fully agree with this point)
>
"The WG believes"? Haven't had much commentary on this section. I
really hate "It is believed", but if authorized to speak for the WG,
I'll do so. It is a value judgment, so I believe that saying "These
cases will... " as a statement of fact is inappropriate.
> 7.2: This should be slightly reworded to more clearly send the message
> that changes to Unicode bidi properties, while not totally impossible,
> are expected to be rare, and to affect mostly symbols and the like,
> which will limit their effect on what the BIDI rule(/test) allows and
> what not.
>
That is something I can't say anything about. Please send text.
> 8. "It is possible that differences in the interpretation of the
> specification": Wrong. There are no differences in interpretation for
> the old spec. There are no differences in the interpretation of the new
> spec. There are differences in the specs themselves.
>
I'll change to "differences in interpretation of labels between
implementations of IDNA2003 and IDNA2008", since that was what I think I
intended to say.
> Regards, Martin.
>
> P.S.: Unfortunately, I will not have time to review the remaining
> documents (tables, rationale) during last call (or this week).
>
>
More information about the Idna-update
mailing list