my comments on draft-ietf-idnabis-protocol-14 (second part)

Tue Sep 1 08:30:38 CEST 2009

(second part of my comments)

Section 5:

para 2: " The two steps described in Section 5.2 are required.": 
Superfluous. Make sure there's a MUST at the right place in that 
section. (Looking at 5.2, I have no clue what the two steps should be.
This shows that indirect requirements like the above are rather unhelpful.)

5.1, first paragraph: Although IDNs will often get extracted from IRIs 
or URIs, there are many cases where these constructs are not involved. 
Examples would be telnet or ping commands, and so on. So IRIs and URIs 
should be deemphasized more.

5.1: "Processing in this step and the next two are local matters, to be 
accomplished prior to actual invocation of IDNA.": Again, which steps? 
Before, we supposedly had two steps in 5.2, now it looks as if we are 
talking about 5.2 and 5.3 as two steps. -> Create a subsection such as 
"Input preparation" or what where all the preliminary stuff goes in. 
Alternatively, talk about subsections, with subsection numbers for clear 
identification.

5.2: "is not already Unicode" -> "is not already in Unicode" (in 
parallel to 'into' in the line before)

5.2 "A Unicode string may require normalization as discussed in Section 
4.1.": There is no "discussion" in 4.1 (and no need for discussion). 
Express the requirements here independently of Section 4.

5.3: (just checking) "See the Name Server Considerations section of 
[IDNA2008-Rationale] for additional discussion on this topic.": From the 
context, Name Server doesn't look related (we are client-side here).

5.3: "That conversion and testing SHOULD": Replace 'That' with something 
clearer and more precise.

5.3, para 2: List up the alternatives that are possible. Avoid mishmash 
textual paragraphs.

5.4, para 1: Mishmash again. Most of this para is best removed.

5.4, para 1: "Putative labels": Both in Section 4 and 5, labels are for 
the most part putative, because they don't conform to the definitions 
unless checked. Either before section 4, or once at the start (Input 
subsection) of both section 4 and section 5, say that for the most part, 
we are dealing with putative labels, but 'putative' isn't repeated all 
the time to make the text easier to read.

5.4, page 12: Finally a bullet list. I almost thought that the author 
didn't know how to create bullet lists, or was of the opinion that 
bullet lists don't have a place in spec. Quite to the contrary, please 
make sure there are much more bullet lists. It will make everything much 
easier to read and clearer.

5.4: "Labels that are not in NFC form as defined in [Unicode-UAX15].": 
There is only one definition of NFC, but the sentence suggests there are 
several. Please change to "Labels that are not in NFC [Unicode-UAX15]."

5.4: Please move bullet 1 (UNASSIGNED) and bullet 4 (DISALLOWED) and all 
the other table-related bullets together. I think it's best to put 
UNASSIGNED last (and mention that this is the category most subject to 
change).

5.4: Streamline the wording used to refer to Tables and a category. 
Currently, we have:
in the UNASSIGNED category of [IDNA2008-Tables]
in the "DISALLOWED" category in the permitted character table 
[IDNA2008-Tables]
that are identified in [IDNA2008-Tables] as "CONTEXTJ"

5.4: "Labels whose first character is a combining mark (see Section 
4.2.3.2).": Refer directly to the relevant Unicode definition, rather 
than to section 4.2.3.2 (which contains a MUST, which is already 
implicit here).

5.4: "In any event, lookup applications should avoid attempting to 
resolve labels that are invalid under that test.": Remove. We already 
have a SHOULD, no need for a should on top of that.

5.4, last para: I assume this is e.g. about labels with mixed 
scripts,... What it essentially seems to say is that a browser may warn 
users if it detects mixed scripts, but if the user still wants to see 
the page, s/he is entitled to it. In such a context, the word 'validity' 
seems quite a bit out of place; it would be better to speak about 'other 
tests' or some such in a more general way.

5.5, para 1: "using the Punycode algorithm (with the ACE prefix added)":
The parenthetical seems to suggest that addition or not of the ACE 
prefix is an (optional) part of the Punycode algorithm, but RFC 3492 
does not define the prefix, nor is the additon of the prefix part of the 
punycode algorithm. -> Convert parenthetical to a clause or sentence
("... and then adding the ACE prefix." or so).

5.5, rest from second sentence in para 1: As said in my comments on 
Section 4, a summary is unnecessary. Also, it has nothing to do with 
punycode conversion. In addition, the second bullet point is confusing, 
because an A-label (checked or not) cannot be punycode-converted again. 
-> remove

5.6: "That ... string" -> "The string resulting from the conversion in 
Section 5.5"

5.6: "That lookup" -> "The lookup"

5.7: What about (streamlined):
    Security Considerations for this version of IDNA are described in 
[IDNA2008-Defs], except for the special issues associated with right to 
left scripts and characters, which are discussed in [IDNA2008-BIDI].

8./9.: These should be merged. The text explains it all.

8.: "Hoffman and Costello ... should not be held responsible for any 
errors or omissions.": Remove, this is implicitly clear, in the end it's 
the WG and the IETF that's responsible. Similar for "As is usual with 
IETF specifications, while the document represents rough consensus, it 
should not be assumed that all participants and contributors agree with 
all provisions."

References [Unicode-RegEx], [Unicode-Scripts], [Unicode-UAX15] (and 
maybe others): Unicode data files don't have explicit authors, but 
Unicode TRs (and similar stuff) has authors/editors, same as RFCs. 
Please don't drop this information.

Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst at it.aoyama.ac.jp