Protocol-03 and updated issues list

Mon Jul 28 04:33:43 CEST 2008

Hi.
I'm attaching the updated copy of the Protocol issues list,
reflecting my response to Marcos (posted earlier) and some
additional notes.   The doc itself is now in the posting queue;
given the cutoff and some ambiguity (in my mind, probably not
the Secretariat's) about when it ends, I don't know when it will
go up.

I had hoped to get the parallel material on Rationale out at
much the same time, but am holding it until I can get into a bit
better shape with the text discussed in the note after next (a
response to Mark on "Interoperability").

    john
-------------- next part --------------
Issues list, IDNABIS Protocol  (as of 20080727)

Section numbers referenced are the same in both
draft-ietf-idnabis-protocol-01 and -02 except as noted.

There are several issues that, because of text moving back and
forth and suggestions for additional moves, are in the issues
list and summary for Rationale.  They are, in general, not
repeated here. 

Note that, unlike Rationale and some independent issues, no
specific "outstanding issues" list was posted for Protocol in
May.  That was due, in part, to the underwhelming response to
the other postings.  So this document incorporates what would
have been that list.

** P.1 ** Location of the Contextual Rules table

    This table is the Appendices to Protocol at present; in
    previous versions it was in Rationale.  Based on list
    discussions, it should probably be moved to Tables,
	probably along with some additional material that is
	still in Rationale. 

    Status: Planned for the post-IETF72 versions of Protocol and
	Tables unless someone can get consensus behind a different
	model.

** P.2 **   Format of the Contextual Rules table.

    At present (Protocol-02), that table consists of a condensed
	format that more or less matches the format used for
	standard entries in Tables.  With it, important fields are
	separated by semocolons, potentially followed by comments
	that start with "#".  As an example, the first entry in the
	table looks like the following in Protocol-01:

	   002D; HYPHEN-MINUS; F;
		  Must not appear at the beginning or end of a label;
		  Regular expression:
		  [^^]\u002D|\u00SD[^$] ;
		  # Note that a prohibition on having two hyphens as
		  the third and fourth characters of anything but a
	      valid A-label appears in the specification.

    Mark Davis suggested a different format.  He wrote (I have not
	preserved his formatting):

		I suggest that the table be formatted for clarity to not
		depend on whitespace -- using names for each field -- and be
		broken into a list of condition/result pairs.

		Code point: 200C
		Name:       ZERO WIDTH NON-JOINER
		Lookup:     True

		# Allow ZWNJ for breaking cursive connection, as needed in
		Farsi. 
		Before:     [[:Joining_Type=Dual_Joining:]
              [:Joining_Type=Left_Joining:]]
              [:Joining_Type=Transparent:]*
		After:      [:Joining_Type=Transparent:]*
		      [[:Joining_Type=Dual_Joining:]
              [:Joining_Type=Right_Joining:]]
		Value:      PVALID

	Comment: There is no dependency on whitespace, at least in
	Protocol-01.  Mark's comments may have reflected an earlier
	version.   I've illustrated a version of these change with
	the alternate appendix (see elsewhere), but I'm not sure
	about it for two reasons.  The first is whether people
	prefer a more compact format or one that uses more vertical white
	space.  We hope that the Contextual Rules registry will remain
	small, but ending up with a significantly longer Tables
	document (remember that this section is normative) as the
	result of format may not be in our best interest.   The
	second point is related to the next issue.

    Status: To facilitiate discussion, a variation on Mark's
	layout is reflected in the second (alternate) appendix
	(see P.3, below).  

** P.3 **  Definition of the Contextual Rules - Regex or otherwise

    Based on my understanding of what was being asked for, the formal
	definitions of the Contextual Rules were written to use a strict
	regular expression syntax with one regular expression per rule.
	That syntax, illustrated by the example above for U+002D,
	is not easy to look at or understand, but would lend itself
	to an automatic rule interpreter.  Mark's example is not
	one of a single rule, or even formal use of a regular
	expression.  If we are going to go that route, there may be
	even more simple ways to express the rules, leaving
	applications implementers on their own for the
	formalizations to be used in their code.  A second appendix
	has been supplied in -02 as the beginning of a suggestion
	for discussion.

    Please examine the two examples above, the discussion in
	the text, and the forms used in both appendices and advise
	on what you would like to see and why. 

    Note that they are provided in these versions of the
	document for comparison purposes only.  Assuming we can
	agree on which one we want, only one will survive into the
	post-IETF versions of the documents. 

    Status: An alternate rule form is outlined in the temporary
	Appendix 2.

** P.4 ** Protocol reference to Bidi Constraints 

   In 4.3.2.4: the bidi constraints apply to more than just
   single labels. 

   Comment: Noted.  The question of those Bidi constraints is
   probably one of the larger and more substantive open issues
   we face.

   Status: Discussion placeholder inserted, but we need to
   resolve the Bidi question before this text is adjusted to
   match.

** P.5 ** Bidi-checking Requirement on Lookup

   Should this be a SHOULD or a MUST?  

   Comment: See the discussion anchor and text in Section 5.5
   (note that this anchor has been in the document for some
   time and there have been no comments on-list).

   Status: Discussion anchor in text.

** P.6 ** Requirement for Policy

   Mark writes about Section 4.4: "While exact policies are not
   specified as part of IDNA2008 and it is expected that
   different registries may specify different policies, there
   SHOULD be policies." This SHOULD is pointless, unless some
   constraints or guidance are given. Otherwise my policy could
   be "any valid IDNA label", which would be precisely the same
   as no policy at all.

   Comment: I fully expect that some zone administrators will
   adopt exactly that sort of policy.  We've said that policy
   decisions, and consistent application of those decisions, by
   zone administrators are an important part of the
   registration model.  We have said that specifying those
   policies and determining their adequacy is not an IETF
   matter, but rather a matter for governments, enterprise
   management, ICANN, and so on.   We have seen application
   implementers evaluate per-zone policies and respond with
   decisions about what to display.  So, I don't think this is
   pointless.  The problem is whether different language would
   better describe the handoff.

   Of course, were we to decide that our audience is purely
   protocol implementers, all of this would go away.

** P.7 **  Universality of Unicode

   Section 5.2 says: "The local character set, character
   coding conventions, and, as necessary, display and
   presentation conventions, are converted to Unicode (without
   surrogates), paralleling the process described above in
   Section 4.2." 

   Mark writes: In the vast majority of cases in modern
   software, the local charset IS Unicode, so this may be
   confusing. Also, UTF-16 does and must use surrogate code
   units, so this needs to be more precise. And excluding
   surrogate code points isn't necessary since gc=Cs are
   forbidden anyway. Suggest:
   "The string is converted from the local character set into
   Unicode, if it is not already Unicode. The exact nature of
   this conversion is beyond the scope of this document, but
   may involve normalization, as described in Section 4.2."

   Comment: I don't know how to evaluate "vast majority...",
   but I keep running across examples and discussions that
   suggest that the presumed minority is fairly large.  The
   most recent example is a lengthy discussion about
   interoperability problems with email text body parts;
   problems that would presumably be infrequent or trivial if
   most systems primarily supported Unicode.

   Status: The specific textual change suggested has been
   made.  Anyone who objects should say so.

** P.8 **  Validation of A-labels

   Section 5.4 says: "In general, that conversion and testing
   should be performed if the domain name will later be
   presented to the user in native character form (this
   requires that the lookup application be IDNA-aware)."

   Mark writes: Suppose that program X creates an A-Label from
   a U-Label, then sends that A-Label to program Y, which sends
   it to program Z, which sends it to program W, which displays
   it.  It sounds like each of Y, Z, W need to validate. Is
   that the intent of this text? If it is only W that needs to
   validate, then it gets a bit murky in today's world, where
   the boundaries between cooperating processes and programs
   are very fuzzy.

   Comment: That "murky" situation is exactly why the text
   leaves a lot of judgment in the hands of the application.
   Note that the sentence after the one you quoted now says, in
   part, "others may treat the string as opaque to avoid the
   additional processing at the expense of providing less
   protection and information to users", which is intended to
   be a very clear statement that there is a trade off
   involved.  I believe that, if you read the surrounding text,
   you will find that the specific answer to your question is
   that it is important that W should (note lower case)
   validate unless it has some reason not to (note that it has
   to go to most of the work to validate in order to display
   and that knowing, as a consequence of system design and
   knowledge of adequate checking and auditing, that it
   wouldn't have received the string unless it was valid).
   Validation is optional, and less likely, for Y and Z, but
   their implementers may certainly do so if they are being
   cautious.

   That text is consistent, I think, with on-list discussions
   that seem to have concluded that we must be very careful
   about not imposing IDNA requirements on programs that are
   not IDNA-aware but that there are some serious spoofing, 
   abuse, and malware opportunities if programs simply assume
   the validity of strings that appear to be A-labels.

   Status: Suggestions for clearer text would be welcome; no
   substantive change between -01 and -02.

** P.9 ** Use of "in parallel"

   Section 5.5 and elsewhere use the term "in parallel" to
   describe the relationship between two (or more) sets of
   steps or procedures.  Mark expresses concern that this will
   create confusion with concurrent operations, which is not
   intended.  He suggests other wording. 

   Comment: Specific suggestions for alternate text would be
   wlecome.

** P.10 **  Similarity of registration and resolution procedures

   Mark points out that the steps in 5.5 are all the same as
   in 4.3 -- except for bidi. This fact should be very clear
   in the text. 

   See also P.15

   Comment: Of course, they were more different until the MAYBE
   categories were removed.  There still seems to be merit in
   describing them separately because implementers of
   registration procedures and actions and implementers of
   lookup ones tend to be different even if there is some
   shared code.    A comment could be inserted nothing the
   parallelism (sic), but I'm not quite sure how or where to do
   that without encouraging people to get sloppy about the
   differences that do exist.

   Specific suggestions and discussion would be welcome.

** P.11 **  Placeholder: Description of steps in both lookup and
registration

	These two sections still need work.

	Status: Discussion anchor in text (anchor2 in -02).
	Not worth doing much with until some related issues are
	sorted out (e.g., whether these should be recombined, see
	above). Specific suggestions welcome.

** P.12 **  Placeholder: Preprocessing

	There is a lengthy placeholder in the text about
	preprocessing issues.  Even if the WG doesn't take on the
	task of standardizing preprocessing, there is a great deal
	of controversy as to whether it is necessary, should be
	required, and should be standardized down to the last
	mapping or if our goal is actually to change the processing
	model, not just the descriptive one.

    Comment: this needs to be resolved in the WG before the
	text in Protocol can be made internally consistent and
	consistent with Rationale.  Aspects of topic have been
	discussed extensively in postings in recent days, including
	the Issues/ Status report for Rationale-01.

** P.13 **  Labels starting in combining marks

    In Section 5.5 (lookup validation and testing), the text
	contains a prohibition on labels starting with combining
	marks. I think we have consensus on the prohibition.  Is
	the statement of it adequate?

    Status: There is a discussion anchor in the text.  It will
    be removed if there is no discussion on this in the near
	future.

** P.14 ** (Editorial)  Text about A-labels on registration

    Should this text be moved from 4.1 to 4.3?

    Status: See placeholder in draft.

** P.15 ** Reducing duplication in "registration" and "lookup"

    The description of steps in Sections 4 and 5 are very
	similar.  There are suggestions that the two be recombined
	(see Mark's notes from some time ago) and another one that
	we create a new section with the common material and point
	to it from the two existing (but shortened) sections (see
	Marcos's notes on Protocol).

    Comment: Keeping the two sections separate contributes,
	IMO, to the goal of making it easier for people to find the
	information they need to implement things correctly, so I'm
	nervous about recombining the sections.   Marcos's
	suggestion seems like a middle ground, although it will
	result in more page-flipping.   

    Status: We need some discussion and at least rough
	consensus on this if it is going to be changed.  Otherwise,
	the default is more or less status quo.

** P.16 **  Versions and the Conceptual Rules

    Marcos suggested that the Conceptual Rules Registry and the
	Derived Properties one have a formal version structure.  I
	may or may not understand the suggestion the way he
	intends, but, if I do, this is an Issue.

	Comment: The IETF has had poor success with version numbers
	in tables and the like, especially those that are intended
	for future compatibility, with the notorious "MIME-Version:
	1.0" header standing out as an example.  So I'm reluctant
	to this unless we have a clear understanding of how we (and
	implementations) would use those versions, what error
	conditions we would expect and to whom they would be
	reported, etc.   Much more discussion and/or text needed.