Proposed document structure and content for IDNA
John C Klensin
klensin at jck.com
Sat Jul 12 14:58:01 CEST 2008
--On Thursday, 10 July, 2008 10:37 -0700 Paul Hoffman
<phoffman at imc.org> wrote:
>...
> There are three large problems with the structure and content
> of the current set of documents; fortunately, all are easy to
> fix at the same time.
>...
> In summary, we should:
>
> - Merge the four documents into one (keeping all the authors,
> of course) - Completely strip out the commentary / history /
> rationale - Move forwards with one concise protocol document
>
> Thoughts?
Paul,
Let me give you my take on this, with the understanding that
most of it has been said before by others.
I'm going to refer to the four existing documents simply as
Tables, Bidi, Protocol, and Rationale below. Lower-case uses
of the same words are just words. I trust that won't confuse
anyone.
This is a long note. I believe that the issues are more
complex than one might infer from your message, that they are
worth careful examination, and that there is no way to do that
in a short message. I hope you and others will take the time
to read it.
First of all, there are, to me, really three separate issues in
your proposal:
(1) Do we need the rationale, history, or explanation
material?
(2) Can and should we separate that rationale, etc.,
material from the specifically-protocol and
implementation material?
(3) How should the protocol material be organized?
It may even be worth breaking these into separate threads, but
let's see how the discussion goes.
I want to take those with the last one first, since I think
that, in part, we may be just having a simple misunderstanding
about it. So...
--------------------
(3) How should the protocol material be organized?
Bidi was originally separate from Protocol (called "Issues"
at the time) simply as a "division of labor" matter. There
are some related "division of expertise" issues -- really
understanding Bidi requires a rather different knowledge
base that really understanding the rest of the protocol (and
table) pieces and there are some extremely complex tradeoffs
in that work. In particular, my hope is that our colleagues
in the RtoL script community will continue to study the Bidi
document very carefully, not because I want to keep them
away from anything else, but because they have special
first-hand expertise and perspective on the implications of
various alternate choices in that area that most of the rest
of us lack. I think that focused examination is facilitated
by keeping the documents separate until we have fairly
stable consensus about the substantive issues in both.
However, I actually see some significant advantages to
combining those two documents before RFC publication -- as a
more or less purely editorial matter -- most of those
advantages parallel comments you, Mark, and others have made
about not having to look at too many documents at the same
time. It would be a bit of extra work at that stage, but
I think it would be worth it in the long run. I believe
there is a strong pragmatic case for publishing them
separately for Proposed Standard and then pulling them
together, just to save time, but my personal belief is that
the tradeoffs work out to combining them now.
If Protocol and Bidi are combined, then the question is
whether the information in Tables should be combined with
them (or, if they are left apart, whether Tables should be
folded into Protocol)? There, I think we reason from
different experience. You believe that we will see a larger
number of implementations, and more accurate ones, if there
is a single, unified, large document. I believe that very
large documents tend to cause people to glaze over and start
skimming, resulting in poorer and less interoperable
implementations because details are missed. While I think
there are elements of truth in both of those perspectives, I
can't offer any clear rule that would resolve the tradeoffs.
But there is another issue for the relationship between
protocol specifications and tables, and Debbie, Mark, and
others have identified important aspects of it over the last
months. While I think we have reached consensus that
changes to the tables themselves should require new RFCs, I
think we have also agreed that the essence of an approach
that is agnostic about Unicode-version includes a long-term
effort toward rules that permit as much version upgrading
as possible to be done as an IANA process, rather than
requiring the often time-consuming and painful WG and RFC
process. The very fact that we have had trouble keeping
this WG active and focused since IETF 71 reinforces my view
that we need to move in that direction.
I think the goal of version-independence strongly suggests a
long-term objective of a new version that should not require
code changes, at least to the basic protocol implementation,
but simply the loading of new tables (some of which, given
the context-dependent characters, may be rules requiring
interpretation, but still expressable as tables). And that,
in turn, whether one seeks precedents in the LTRU work or
elsewhere, argues very strongly for keeping the core
protocol specification material and the core table material
in separate documents that can be developed asynchronously
(and maybe even with different pools of expertise) in the
future. We may not achieve that goal in the first revision
or two, but I think it is important to laying foundations
for this work that are as stable, adaptable, and predictable
as possible in the future.
That conclusion doesn't lead me to believe that the
distribution of material between future versions of Tables
and Protocol is exactly right. We have already concluded (I
think) that, once the contextual rule specifics are
moderately stable, they should be part of Tables (or a
separate document, for which I am _not_ going to argue)
rather than Protocol (now) or Rationale (earlier). There
may very well be material in the rule-generating parts of
Tables that should be moved into Protocol (I'm not sure, and
can't suggest a good rule for finding the boundary right
now, but I think it is worth examining).
If we decide to keep Protocol and Tables separate, we
probably need to examine each for material that should be in
the other. We should, however, do so while keeping in mind
the time-delay consequences of spending time fine-tuning
shuffling text around. Unlike the Bidi case, my personal
intuition is that, unless there are glaring mislocations
(and I think there is with the Contextual Rule material and
have said so in the past), it is not worth spending a lot of
time parsing those two documents differently for Proposed
Standard. If nothing else, Bidi moves more or less in one
chunk while, if the dividing line is "stable at the same
level as protocol specification" versus "likely to change
with Unicode versions", Tables would have to be analyzed
paragraph by paragraph, which is a much more time-consuming
process.
--------------------
(1) Do we need the rationale, history, or explanation material?
In most IETF situations, I'm extremely sympathetic to the
"focus on the protocol implementers and just tell them what
to do" approach that I understand you to be advocating,
with explanations left to personal-opinion documents
published outside the IETF process. I think IDNA is
different for several reasons and that those reasons
require the type of material that now appears in Rational
if we want the effort to succeed. I note that we have
experience ("running code") on several of the issues I list
below; they are not entirely speculation.
(i) IDNA is a client-side protocol, not a "wire" one. As
such, it offers far more incentives and opportunities
for local variations than there are with the typical
client-server or peer to peer protocol, especially at
the lower layers of the stack. Unless we can find the
Protocol Police and get them to enforce IETF decisions
and whims that contribute to interoperability, we need
to get people to understand the reasoning behind the
protocol and the reasons why they should conform.
Those explanations are much more effective if they are
authoritative (i.e., from the IETF) rather than stated
as personal opinions or even as "not reviewed".
(ii) We have already seen large numbers of folks who are
willing to sacrifice DNS interoperability to their own
parochial concerns (especially when they are in the
"Names Business" -- a concept I still find horrifying,
but that battle was lost long ago -- or the
"alternate DNS protocol" business) and the money is
their key concern. Again, we need to explain, as
carefully as possible, why it is important, and
ultimately to their advantage, to conform to the
specifications or we need to assume that some of them
won't conform simply because they don't understand
(those who understand and don't care are another
matter... and hopeless).
(iii) Our audience is not just protocol implementers,
whether we like that or not. Decisions we make
interact with policy decisions. Good behavior and
interoperability with IDNA2003 is heavily dependent on
good behavior by zone administrators ("registries") but
that is explicit only in a rarely-seen IESG note. The
current IDNA2008 drafts make that dependency explicit.
We don't want to get into the business of trying to
make the policies, but, unless we provide a clear
foundation and narrative rationale for people who have
to think about those policies, we should expect the
technical issues to be ignored entirely (and have more
than enough evidence of that behavior already).
Tina and Cary have, I believe, both sent notes to the
list on this subject that are much more eloquent than
my summary in the paragraph above.
At the risk of opening up another can of worms, I suggest that
we have another recent worked IETF-related example of a
situation in which publishing the theory and rationale in a
visible and authoritative way is as important as the
specification itself. RFC 3245 notwithstanding, ENUM is
largely defined as "just do this", partially because the guilty
parties (largely Patrik, Richard Shockey, and myself) thought
it completely obvious that the whole model was of dubious value
unless we had a single tree. Proper functioning, and effective
user-level interoperability, of ENUM depends heavily on
registry and operator behavior, just like IDNA. But we never
explained the issues and thinking clearly enough in a public
place, a number of actors didn't get the point, and here we
are, back to the point at which the user at one endpoint
largely has to know who services the entity they are trying to
contact in order to look the contact information up. There are
cases in which adopting a "no rational or explanations"
position really does have serious negative impact on
interoperability.
I agree that it may be hard to get agreement on the text that
explains some of our reasoning and that there may be different
perspectives on the reasoning itself. I don't think that
problem is likely to be as difficult as you (Paul) expect,
especially if we can all work together in good faith to get
these documents out (which I do expect we will continue to be
able to do). And, for the reasons above, I think it is worth
it.
--------------------
(2) Can and should we separate that rationale, etc., material
from the specifically-protocol and implementation material?
This question is obviously irrelevant if we decide to drop the
explanatory material and the Rationale document itself.
However, if the discussion in (1) above, and other
considerations, are accepted, it is worth a little discussion.
A different way of stating the question is whether, if we keep
the Rationale document, we should strive to eliminate all
normative references from Tables, Bidi (if it isn't merged with
Protocol), and Protocol.
Certainly that would make it easier for the protocol
implementer, and we agree that is an important consideration.
However, moving all of the material out, especially the
definitions, would mean that the other audiences would have to
dig into the protocol materials in order to read the
explanations. I think that is a poor tradeoff, something I
can elaborate on at more length if necessary.
That does not imply that I believe that the balance of material
between Rationale and the other documents is exactly right. As
you know, Protocol started out as part of Rationale/Issues (a
good idea for initial development but we all agree it would
have been a bad one for the long term). If things need to be
moved around, let's move them around. I hope we can do that
with careful consideration for the balance between perfection
and completion time, but I do think we should take advantage of
any low-lying fruit.
--------------------
Specific alternate recommendations:
(a) We retain the rationale and explanatory material, and
retain it in a separate document
(b) We plan to merge Bidi and Protocol, but do only after we
have substantial agreement on the technical elements of both
documents.
(c) We retain the separation between Tables and Protocol to
facilitate updating of the former and, conversely, to drive
home the stability of the latter.
(d) We move the "Contextual Rules" material from Protocol to
Tables. Patrik and I have just had a mini-discussion about
this and propose that we make that transition in the first pair
of drafts after IETF (we would have done it now but for the
risk of making confusing mistakes that could not be corrected
before the posting deadline.
(e) We all look for additional places where sections of text
could conveniently be moved among the documents in ways that
would improve clarity and ease of reading, keeping in mind that
we have multiple audiences and that an optimal organization for
one might not be optimal for another.
regards,
john
More information about the Idna-update
mailing list