my comments on draft-ietf-idnabis-bidi-05

Wed Sep 9 04:27:52 CEST 2009

Hello Harald,

Many thanks for your thorough answers. Some additional comments inline.

On 2009/09/09 5:27, Harald Alvestrand wrote:
> Apologies for being a week late in responding. I'll try to respond to
> those issues that haven't already been beaten to death.
>
> (Cary - I need your expertise for a couple of the issues. Please help!)
>
> Martin J. Dürst wrote:
>> Abstract: Should mention bidi rules first, then changes (this has been
>> fixed in the document itself, which is great).
>>
>> Abstract, and potentially elsewhere: Avoid the word 'new'. RFCs are
>> archival documents.
> Will try. Am going blind by now to such things, though....

I understand. 'Find' may help, though :-).

>> 1.1, para 2: "When labels satisfy the rule, and when certain other
>> conditions are satisfied, they can be used with a minimal chance of
>> these labels being displayed in a confusing way by a bidirectional
>> display algorithm.": "they" .. "these labels" is confusing. What about
>> "When labels satisfy the rule, and when certain other conditions are
>> satisfied, there is only a minimal chance that these labels will be
>> displayed in a confusing way by a bidirectional display algorithm."
> Will do.
>> 1.1: "A bidirectional display algorithm": How many of them do we have?
>> (I only know one, the Unicode one (with some minor variants)). How
>> many of them have been used for testing/verification?
>>
>> 1.1, para 3: what exactly is a "right-to-left character"?
> Type R or AL. I'll add this to definitions. AN is a bit weird.
>> 1.2: This section ideally should also be moved to after Section 2.
> I don't agree; I prefer to have the context-setting done first, and
> splitting the section to sort out what's not necessary context-setting
> is more work than I care for.
>> 1.2, para 1: "The IDNA specification "Stringprep"": change to
>> something like "Stringprep, part of IDNA2003". Otherwise, it's not
>> clear that this is an old spec.
> Will do. Will also change the entire 2003 description to past tense.
>> 1.2, para 4: "However, this makes certain words" -> "However, this
>> made certain words" (past tense)
>>
>> 1.2, para 7: "While the document specifies rules" -> "While this
>> document specifies rules"
> Will do.
>> 1.2, para 7: "(the most important being label that mix Arabic and
>> European digits (AN and EN) inside an RTL label, and labels that use
>> AN in an LTR label)": Very weird. Such cases may not be completely
>> impossible, but they are much less frequent than e.g. Arabic numbers
>> inside Arabic letters, European numbers inside Arabic letters, and so
>> on. There was even a strong movement to prohibit number mixing at the
>> protocol level; this would never have happened if such mixing would
>> have been deemed to be "most important". Also, after looking at the
>> actual conditions, we either have an RTL label, which by condition 4
>> excludes mixing EN and AN, or we have an LTR label, which by condition
>> 5 excludes AN and therefore the mixture of EN and AN.
> The commentary on version 4 asked for specific examples of strings that
> were allowed under IDNA2003's BIDI rule, but disallowed under this
> specification. This is it.

Oh, that's what it's supposed to be. But because it says "allowed under 
this specification" immediately before the parenthesis, it looks like 
these are still allowed.

 > Is it possible to make this clearer?

I very much think so. I propose to change:

    While the document specifies rules quite different from RFC 3454,
    most reasonable labels that were allowed under RFC 3454 will also be
    allowed under this specification (the most important being labels
    that mix Arabic and European digits (AN and EN) inside an RTL label,
    and labels that use AN in an LTR label), so the operational impact of
    using the new rule in the updated IDNA specification is limited.

To:

    While the document specifies rules quite different from RFC 3454,
    most reasonable labels that were allowed under RFC 3454 will also be
    allowed under this specification, so the operational impact of
    using the new rule in the updated IDNA specification is limited. The
    most important cases that are no longer allowed are  labels
    that mix Arabic and European digits (AN and EN) inside an RTL label,
    and labels that use AN in an LTR label.

>> 1.3, title: "Layout" -> "Structure" or "Organization"
> I prefer "layout", but "structure" is also OK with me. Will change.

[When I hear 'layout', I think about 'what goes on which page'.]

>> 1.3, para 1: Change from "bidi test" to "bidi rule". (or unify otherwise)
>>
>> 1.3, para 1: ", that" -> ", which"
> when to use "which" and when to use "that" seems to be a bone of
> contention among linguistically-competent people.

The RFC Editor can fix that.

>> 1.3, para 1: "no matter what the direction of the label is": What does
>> this mean? It could either mean that you can apply the test forwards
>> or backwards, or it could mean that it doesn't depend on what
>> directionality the characters in the label have, or whatever. In the
>> later case, I'd write e.g.: "This test [->rule, see above and below]
>> can be applied to any kind of label, but becomes trivial if the input
>> is guaranteed to contain only LTR characters."
> It means that you can apply the test to both RTL and LTR labels.
> It's not trivial for LTR labels either.

I see. I think the text is fine as it is, then.

>> 1.3: "The primary initial use of that test": "that test" -> "this
>> test" (this sentence talks about relationship with other documents, so
>> it's the test in this document, not the test in that other section)
> Will do.
>> 1.3, para 2: "a BIDI rule" -> "the BIDI rule"
>>
>> 1.3, para 3: "new rule proposed here" -> "new rule proposed" (we are
>> talking about document organization, so it's "the rule in that other
>> section over there", so "here" doesn't fit)
>>
>> 1.3, para 4: "Section 5 to Section 9 describe" -> "Section 5 to
>> Section 7 describe": Section 8 is IANA consideration.
> Will fix.
>> 1.4: I have no problem following this stuff because I have worked on
>> bidi earlier, but somebody who's not familiar with BIDI will encur a
>> very steep learning courve. Either help a bit more with e.g. sentences
>> such as "for the purposes of bidirectional layout, each Unicode
>> character is assingned a BIDI property value."
> ....or? I'd like to resist adding more tutorial material here. This
> document will remain incomprehensible until one reads the Unicode BIDI
> specification.
>> 1.4: "non spacing" -> "nonspacing"
>>
>> 1.4, "The directionality of such examples" -> "The display order of
>> such examples"
>>
>> 1.4, "it means ..., approximately" -> "it approximately means"
> I like the other order; YMMV.
>> 1.4 "An RTL label": This seems to be the definition that Protocol
>> might want to refer to.
> Yes.
>> 1.4 'Having a separate category of "RTL domain names" would not make
>> this specification simpler, so has not been done.' -> 'Providing a
>> separate category of "RTL domain names" would not make this
>> specification simpler.'
>>
>> Section 2 (title), and elsewhere: Both "Bidi rule" and "Bidi test" are
>> used, that's confusing. The term is always in singular. The document
>> works that way in general, but "The following test" at the start of
>> Section 2 is confusing, because the only 'tests' that one can see are
>> the ones labeled 1. to 6. Maybe use something like "In order to pass
>> the BIDI test, the following conditions 1. to 6. must all be satisfied."
> I thought I'd already added that.... will do.
>> 2, conditions 2/4: Why are BN (control characters) allowed in RTL but
>> not in LTR?
> Error. See other thread.
>> 3. "A requirement" -> "The requirement" (see above)
>>
>> 3., para 2: As this restricts things to the Unicode bidi algorithm,
>> please say this earlier. (see above)
>>
>> 3., para 3: "requirements proposed" -> "requirements" (we are working
>> on finalizing this document, we are no longer in the proposal stage)
>>
>> 3., requirement 2: Is the choice of 'characters delimiting the labels'
>> open, is this only the ASCII dot, is this a small set (I'm interested
>> in this both for spec clarity and because the answer might strongly
>> affect draft-duerst-iri-bis).
> The formalistic part says that "delimiterchars" are of class CS, WS and ON.
> For IRI: Note the comment that says that the percent sign breaks things.

Okay, thanks!

>> 3, 'possible requirement' related to directionality controls:
>> "(outside of the labels)" -> "(outside, but potentially directly
>> adjacent of the labels)" (does this include cases with directionality
>> controls inside a domain name, i.e. before/after a dot?)
>> "the conditions above require extra testing" -> "the conditions above
>> required extra testing"
> It's intended to mean "not between the labels". Will clarify.
>> 3., 'Delimiterchars': FULL STOP not allowed in domain names?????
> Should be "labels". Will fix.
>> 4.1, para 1: "This marking is obligatory, and both double vowels and
>> syllable-final consonants are indicated by the marking of special
>> unvoiced characters." -> "This marking is obligatory, and
>> syllable-final consonants are indicated a special unvoiced character."
>> (double (long) vowels are indicated in Unicode by their own combining
>> mark, which is of course voiced. These are graphically in most cases
>> just duplications of the single (short) vowels. The current text
>> suggests a special "duplicate the proceeding vowel" sing similar to
>> the one (sukun) for consonants, but such a suggestion is wrong.)
> I'll leave this to Cary....

(note that in my new text, I forgot a 'by': "indicated a special" -> 
"indicated by a special")

>> 4.1, Thaana 'Computer' example: "UBIUFILI" -> "UBUFILI"
> but this one I can fix....
>> 4.2: This section could be shortened considerably. "Greater latitude
>> here than ... Dhivehi." is irrelevant; as long as a significant part
>> of a language's words cannot be used in IDN, there's a problem. The
>> subsection is interesting for people interested in Yiddish, but the
>> average reader of the spec will try to find something relevant for the
>> algorithm, and mostly be more confused than enlightened.
>>
>> 4.3: "(with the 5 being considered right-to-left because of the
>> leading ALEF)": No, the 5 itself is never right-to-left. Change to
>> "(the overall directionality being right-to-left because of the
>> leading ALEF)"
>>
>> 4.3: "but barring them both seems to require justification" -> "but
>> barring them both seems unnecessary" or "but barring them both turned
>> out to be unnecessary"
> I'll let Cary handle this one too. He's the Hebrew expert.
>> 5. "Even if a label is registered under a "safe" label,": 'under'
>> should be explained more clearly (I assume this refers to the
>> hierarchical relationship in the DNS)
> It does. I assume that readers have a passing acquaintance with the DNS;
> here too, I don't want to descend into excessive hand-holding. Too much
> risk of getting it wrong.
>> 5., last paragraph: It would be better to change this into a SHOULD,
>> such as "Where implementations see a a way to avoid ..., they SHOULD
>> avoid". That will bring this issue on the radar screen of
>> implementers, whereas it currently will just be glossed over.
> This document presently does not use (or need) 2119 language.
> I removed all normative statements about what people should or should
> not do in this situation after the discussion in Dublin; unless the
> Chair declares consensus otherwise, I will keep it that way.
>> 6., first paragraph: "All other issues with these scripts": What
>> scripts???
> Right-to-left scripts.

Please make that explicit.

>> 6. "wishes to create rules for the mixing of digits" -> "wishes to
>> create rules against the mixing of digits" or "wishes to restrict the
>> mixing of digits"
> better. Will do.
>> 6. "Rules are also specified at the protocol level, but while the
>> example above involves right-to-left characters, this is not
>> inherently a BIDI problem." -> "This example is not inherently a BIDI
>> problem, so such restrictions are not specified at the protocol level."
>> ("Rules are also specified at the protocol level" is inherently vague;
>> it seems to mean "Some rules against mixing digits are also specified
>> at the protocol level, but only when this is necessary to avoid a BIDI
>> problem.")
> Better. Will do.
>> 6. "It is unrealistic to expect that applications will display domain
>> names using embedded formatting codes between their labels (for one
>> thing, no reliable algorithms for identifying domain names in running
>> text exist);": Please add that it is also unrealistic that formatting
>> codes are removed before IDNA processing, and that allowing formatting
>> codes could lead to many kinds of 'mischief' that would go against the
>> two requirements in section 3.
> It's not an issue in need of resolution, so I'll skip that.
>> 6. "which might surprise someone expecting to see labels displayed in
>> hierarchical order.": Please add that this may not be such a big
>> problem to general users familiar with BIDI, because they are used to
>> seeing/reading a sequuence of RTL units (e.g. words) from right to left.
>> (for wording alternatives, see
>> http://tools.ietf.org/html/rfc3987#section-4.4, first para, *second
>> para*, ...)
> Cary mentioned that registrations under .museum show that this is not so
> clear-cut...

Of course it's not clear-cut. That's why I'm not proposing to take out 
the "might surprise someone" bit. But I think it's very helpful and 
important to tell people how they can look at it in a way that makes 
some sense. This can greatly affect acceptance.

>> 7.1: Bullet points 1 and 2 are major, whereas bullet point 3 is really
>> farfetched (not impossible just because there is no guarantee against
>> weird implementations). It would be good to indicate that somehow.
>> (this includes the paragraph following bullet point 3)
> This came out of an exchange with Paul Hoffman, I believe. I think it's
> far-fetched too, but it is the one case that people came up with where
> the relaxation of what characters are allowed at the end of a label has
> any specific effect.
>> 7.1: "The editors believe": change to something less specific; this is
>> a WG document, we either have rough consensus or we don't. (I for one
>> fully agree with this point)
> "The WG believes"? Haven't had much commentary on this section. I really
> hate "It is believed", but if authorized to speak for the WG, I'll do
> so. It is a value judgment, so I believe that saying "These cases
> will... " as a statement of fact is inappropriate.

"It is believed" is indeed a bit strange. What about "It was judged"?

>> 7.2: This should be slightly reworded to more clearly send the message
>> that changes to Unicode bidi properties, while not totally impossible,
>> are expected to be rare, and to affect mostly symbols and the like,
>> which will limit their effect on what the BIDI rule(/test) allows and
>> what not.
> That is something I can't say anything about. Please send text.

I propose to change:

    However, the determination of validity for any string depends on the
    Unicode BIDI property values, which are not declared immutable by the
    Unicode Consortium.  Furthermore, the behaviour of the algorithm for
    any given character is likely to be linguistically and culturally
    sensitive, so that it is not unlikely that later versions of the
    Unicode standard may change the BIDI properties assigned to certain
    Unicode characters.

To something like:

    The determination of validity for any string depends on the
    Unicode BIDI property values. These properties are not declared
    immutable by the Unicode Consortium, but changes are highly unlikely.

[I'm not speaking for the Unicode Consortium, and can't give any data. Ken?]

>> 8. "It is possible that differences in the interpretation of the
>> specification": Wrong. There are no differences in interpretation for
>> the old spec. There are no differences in the interpretation of the
>> new spec. There are differences in the specs themselves.
> I'll change to "differences in interpretation of labels between
> implementations of IDNA2003 and IDNA2008", since that was what I think I
> intended to say.

Okay.

Regards, Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp