Proposed document structure and content for IDNA

Sat Jul 12 14:58:01 CEST 2008

--On Thursday, 10 July, 2008 10:37 -0700 Paul Hoffman
<phoffman at imc.org> wrote:

>...
> There are three large problems with the structure and content
> of the  current set of documents; fortunately, all are easy to
> fix at the  same time.
>...
> In summary, we should:
> 
> - Merge the four documents into one (keeping all the authors,
> of course) - Completely strip out the commentary / history /
> rationale - Move forwards with one concise protocol document
> 
> Thoughts?

Paul,

Let me give you my take on this, with the understanding that
most of it has been said before by others.

I'm going to refer to the four existing documents simply as
Tables, Bidi, Protocol, and Rationale below.  Lower-case uses
of the same words are just words. I trust that won't confuse
anyone.

This is a long note.   I believe that the issues are more
complex than one might infer from your message, that they are
worth careful examination, and that there is no way to do that
in a short message.  I hope you and others will take the time
to read it.

First of all, there are, to me, really three separate issues in
your proposal:

	(1) Do we need the rationale, history, or explanation
	   material?

	(2) Can and should we separate that rationale, etc.,
	   material from the specifically-protocol and
	   implementation material?

	(3) How should the protocol material be organized?

It may even be worth breaking these into separate threads, but
let's see how the discussion goes.

I want to take those with the last one first, since I think
that, in part, we may be just having a simple misunderstanding
about it.  So...

  --------------------

(3) How should the protocol material be organized?

   Bidi was originally separate from Protocol (called "Issues"
   at the time) simply as a "division of labor" matter.  There
   are some related "division of expertise" issues -- really
   understanding Bidi requires a rather different knowledge
   base that really understanding the rest of the protocol (and
   table) pieces and there are some extremely complex tradeoffs
   in that work.  In particular, my hope is that our colleagues
   in the RtoL script community will continue to study the Bidi
   document very carefully, not because I want to keep them
   away from anything else, but because they have special
   first-hand expertise and perspective on the implications of
   various alternate choices in that area that most of the rest
   of us lack.  I think that focused examination is facilitated
   by keeping the documents separate until we have fairly
   stable consensus about the substantive issues in both. 

   However, I actually see some significant advantages to
   combining those two documents before RFC publication -- as a
   more or less purely editorial matter -- most of those
   advantages parallel comments you, Mark, and others have made
   about not having to look at too many documents at the same
   time.    It would be a bit of extra work at that stage, but
   I think it would be worth it in the long run.   I believe
   there is a strong pragmatic case for publishing them
   separately for Proposed Standard and then pulling them
   together, just to save time, but my personal belief is that
   the tradeoffs work out to combining them now.

   If Protocol and Bidi are combined, then the question is
   whether the information in Tables should be combined with
   them (or, if they are left apart, whether Tables should be
   folded into Protocol)?  There, I think we reason from
   different experience.  You believe that we will see a larger
   number of implementations, and more accurate ones, if there
   is a single, unified, large document.  I believe that very
   large documents tend to cause people to glaze over and start
   skimming, resulting in poorer and less interoperable
   implementations because details are missed.  While I think
   there are elements of truth in both of those perspectives, I
   can't offer any clear rule that would resolve the tradeoffs.

   But there is another issue for the relationship between
   protocol specifications and tables, and Debbie, Mark, and
   others have identified important aspects of it over the last
   months.  While I think we have reached consensus that
   changes to the tables themselves should require new RFCs, I
   think we have also agreed that the essence of an approach
   that is agnostic about Unicode-version includes a long-term
   effort toward rules that permit as much version upgrading
   as possible to be done as an IANA process, rather than
   requiring the often time-consuming and painful WG and RFC
   process.   The very fact that we have had trouble keeping
   this WG active and focused since IETF 71 reinforces my view
   that we need to move in that direction.   

   I think the goal of version-independence strongly suggests a
   long-term objective of a new version that should not require
   code changes, at least to the basic protocol implementation,
   but simply the loading of new tables (some of which, given
   the context-dependent characters, may be rules requiring
   interpretation, but still expressable as tables).  And that,
   in turn, whether one seeks precedents in the LTRU work or
   elsewhere, argues very strongly for keeping the core
   protocol specification material and the core table material
   in separate documents that can be developed asynchronously
   (and maybe even with different pools of expertise) in the
   future.  We may not achieve that goal in the first revision
   or two, but I think it is important to laying foundations
   for this work that are as stable, adaptable, and predictable
   as possible in the future.

   That conclusion doesn't lead me to believe that the
   distribution of material between future versions of Tables
   and Protocol is exactly right.  We have already concluded (I
   think) that, once the contextual rule specifics are
   moderately stable, they should be part of Tables (or a
   separate document, for which I am _not_ going to argue)
   rather than Protocol (now) or Rationale (earlier).  There
   may very well be material in the rule-generating parts of
   Tables that should be moved into Protocol (I'm not sure, and
   can't suggest a good rule for finding the boundary right
   now, but I think it is worth examining).  

   If we decide to keep Protocol and Tables separate, we
   probably need to examine each for material that should be in
   the other.  We should, however, do so while keeping in mind
   the time-delay consequences of spending time fine-tuning
   shuffling text around.  Unlike the Bidi case, my personal
   intuition is that, unless there are glaring mislocations
   (and I think there is with the Contextual Rule material and
   have said so in the past), it is not worth spending a lot of
   time parsing those two documents differently for Proposed
   Standard.  If nothing else, Bidi moves more or less in one
   chunk while, if the dividing line is "stable at the same
   level as protocol specification" versus "likely to change
   with Unicode versions", Tables would have to be analyzed
   paragraph by paragraph, which is a much more time-consuming
   process.

  --------------------

(1) Do we need the rationale, history, or explanation material?

	In most IETF situations, I'm extremely sympathetic to the
	"focus on the protocol implementers and just tell them what
	to do" approach that I understand you to be advocating,
	with explanations left to personal-opinion documents
	published outside the IETF process.   I think IDNA is
	different for several reasons and that those reasons
	require the type of material that now appears in Rational
	if we want the effort to succeed.   I note that we have
	experience ("running code") on several of the issues I list
	below; they are not entirely speculation.

	(i) IDNA is a client-side protocol, not a "wire" one.  As
	    such, it offers far more incentives and opportunities
	    for local variations than there are with the typical
		client-server or peer to peer protocol, especially at
		the lower layers of the stack.  Unless we can find the
		Protocol Police and get them to enforce IETF decisions
		and whims that contribute to interoperability, we need
		to get people to understand the reasoning behind the
		protocol and the reasons why they should conform.
		Those explanations are much more effective if they are
		authoritative (i.e., from the IETF) rather than stated
		as personal opinions or even as "not reviewed".

	(ii) We have already seen large numbers of folks who are
	    willing to sacrifice DNS interoperability to their own
		parochial concerns (especially when they are in the
		"Names Business" -- a concept I still find horrifying,
		but that battle was lost long ago -- or the
		"alternate DNS protocol" business) and the money is
		their key concern.  Again, we need to explain, as
		carefully as possible, why it is important, and
		ultimately to their advantage, to conform to the
		specifications or we need to assume that some of them
		won't conform simply because they don't understand
		(those who understand and don't care are another
		matter... and hopeless).

	(iii) Our audience is not just protocol implementers,
	    whether we like that or not.  Decisions we make
		interact with policy decisions.  Good behavior and
		interoperability with IDNA2003 is heavily dependent on
		good behavior by zone administrators ("registries") but
		that is explicit only in a rarely-seen IESG note.  The
		current IDNA2008 drafts make that dependency explicit.
		We don't want to get into the business of trying to
		make the policies, but, unless we provide a clear
		foundation and narrative rationale for people who have
		to think about those policies, we should expect the
		technical issues to be ignored entirely (and have more
		than enough evidence of that behavior already).

		Tina and Cary have, I believe, both sent notes to the
		list on this subject that are much more eloquent than
		my summary in the paragraph above.

At the risk of opening up another can of worms, I suggest that
we have another recent worked IETF-related example of a
situation in which publishing the theory and rationale in a
visible and authoritative way is as important as the
specification itself.  RFC 3245 notwithstanding, ENUM is
largely defined as "just do this", partially because the guilty
parties (largely Patrik, Richard Shockey, and myself) thought
it completely obvious that the whole model was of dubious value
unless we had a single tree.  Proper functioning, and effective
user-level interoperability, of ENUM depends heavily on
registry and operator behavior, just like IDNA.  But we never
explained the issues and thinking clearly enough in a public
place, a number of actors didn't get the point, and here we
are, back to the point at which the user at one endpoint
largely has to know who services the entity they are trying to
contact in order to look the contact information up.  There are
cases in which adopting a "no rational or explanations"
position really does have serious negative impact on
interoperability.

I agree that it may be hard to get agreement on the text that
explains some of our reasoning and that there may be different
perspectives on the reasoning itself.  I don't think that
problem is likely to be as difficult as you (Paul) expect,
especially if we can all work together in good faith to get
these documents out (which I do expect we will continue to be
able to do).   And, for the reasons above, I think it is worth
it.

  --------------------

(2) Can and should we separate that rationale, etc., material
from the specifically-protocol and implementation material?

This question is obviously irrelevant if we decide to drop the
explanatory material and the Rationale document itself.
However, if the discussion in (1) above, and other
considerations, are accepted, it is worth a little discussion.

A different way of stating the question is whether, if we keep
the Rationale document, we should strive to eliminate all
normative references from Tables, Bidi (if it isn't merged with
Protocol), and Protocol.

Certainly that would make it easier for the protocol
implementer, and we agree that is an important consideration.
However, moving all of the material out, especially the
definitions, would mean that the other audiences would have to
dig into the protocol materials in order to read the
explanations.   I think that is a poor tradeoff, something I
can elaborate on at more length if necessary.

That does not imply that I believe that the balance of material
between Rationale and the other documents is exactly right.  As
you know, Protocol started out as part of Rationale/Issues (a
good idea for initial development but we all agree it would
have been a bad one for the long term).   If things need to be
moved around, let's move them around.  I hope we can do that
with careful consideration for the balance between perfection
and completion time, but I do think we should take advantage of
any low-lying fruit.

  --------------------

Specific alternate recommendations:

(a) We retain the rationale and explanatory material, and
retain it in a separate document

(b) We plan to merge Bidi and Protocol, but do only after we
have substantial agreement on the technical elements of both
documents.

(c) We retain the separation between Tables and Protocol to
facilitate updating of the former and, conversely, to drive
home the stability of the latter.

(d) We move the "Contextual Rules" material from Protocol to
Tables.  Patrik and I have just had a mini-discussion about
this and propose that we make that transition in the first pair
of drafts after IETF (we would have done it now but for the
risk of making confusing mistakes that could not be corrected
before the posting deadline.

(e) We all look for additional places where sections of text
could conveniently be moved among the documents in ways that
would improve clarity and ease of reading, keeping in mind that
we have multiple audiences and that an optimal organization for
one might not be optimal for another.

     regards,
      john