Complex Problems (Was: Re: Discipline of Internet Protocol Engineering)

Fri Jul 4 16:20:47 CEST 2003

John,

JCK> I think one needs to distinguish between "adequate understanding
JCK> of the problem" and "overly broad attempts at a solution".

Yes, thank you.  A very useful distinction.

Alas, *each one* needs to be balanced between taking too long and being
too hasty, I think.  (And then, of course, they need to balanced with
each other... it's amazing we ever get it right.)  I believe we are
currently having problems with each of these.

JCK> when we take a complex _problem_ and say "well, we understand
JCK> the lower-right-corner of it, so let's 'solve' that" often leads 
JCK> us into a solution that doesn't integrate well with the other 
JCK> pieces when those come along.

Certainly there is the danger you cite, and certainly there are plenty
of examples of this sin being committed around the world -- very much
including within IETF.

If one takes a view that is too narrow, or too localized, then they are
likely (or certain) to commit this sin.

One could argue that the real, underlying magic of the IETF's success
has been the community's ability to hold the larger "problem
understanding" that is needed, without spending excessive time
developing it. The same has frequently been true for design of the
larger solution space. This probably has been possible because there has
been so much shared culture, so many skilled architects (including skill
with scaling), and a relatively small domain of discourse.

In terms of community proportions, I suspect none of these hold true
anymore. (I said proportions, not actual numbers.)  Hence, we cannot
rely on assumptions and shared culture.  We need to institutionalize
mechanisms that achieve similar results.  (yikes.)

JCK> Clearly, it is important to find a balance, since the quest for
JCK> comprehensive and perfect understanding of every problem would 
JCK> result in our never actually getting anything done

ack.

JCK> (if one were cynical, one might make that observation about this WG
JCK> as well as about the IETF's technical/engineering work).

or, worse, maybe *not* cynical...

JCK>   But "this 
JCK> piece will work, so let's do it" needs, IMO, to be carefully 
JCK> balanced with "and _what_ problem is it solving?".  And we need 
JCK> to have clear explanations and a reasonable consensus about the 
JCK> latter.   I suggest that failure to take that step has gotten us 
JCK> into a lot of trouble in some areas in the past and that may be 
JCK> where my views and Keith's intersect.

and even mine.

(*** Warning.  Folks --

   X.400 rathole section follows.

   Suggestion to the X.400/email technology-challenge: attend to the
   structure and process debate, without worrying whether either John or
   I are precisely correct about the facts. That is, treat it as a bench
   analysis of approaches, rather than an historical review...
***)

>> One only needs to look at OSI in general, and X.400 in
>> particular.
JCK> And I would suggest that the failure in X.400 was that they lost
JCK> track of the problem analysis (which was actually pretty good, 
JCK> IMO, in the early stages of that work)

not sure i agree on the assessment of the early stages. my recollection
is that X.400 tried, rather explicitly, to solve a whole array of human
messaging problems for which the constituency was largely theoretical,
and for which the solution(s) were very poorly understood.

JCK> They also left two or three important issues out of that problem 
JCK> analysis: a reasonable migration path from earlier systems

That was explicit. The world was explicitly expected simply to switch
over.

And therein lies and important lesson about installed base.  (How else
could anything as ugly as MIME prove so successful?)

JCK> (including, later on, a reasonable migration path from earlier 
JCK> versions of X.400)

Or, one could argue, that was the problem of making too darn many
changes to existing functionality, and issuing them before vendors could
recover their investment.

And let's not forget just how complicated the specs were/are.

JCK> and general usability (aggravated by the fact 
JCK> that there were already deployed systems that were easier to 
JCK> use).

Another lesson, methinks.

JCK> A possible additional important issue was that we already 
JCK> had deployed and successful experience with more or less peer to 
JCK> peer email systems, while the analysis that led to X.400 assumed 
JCK> a highly-structured "email provider" environment.

A rather striking point about this is that the core of the X.400 work
was done by folks with experience doing Arpanet stuff, though maybe not
much with email.  Hmmmm.

>> The usual logic for explaining this problem is that the
>> complexity of juggling all the issues, across the entire
>> service, simply buries the development effort.
JCK> No disagreement on this.   But it is possible to do modular
JCK> development against a systems-level problem understanding.  The 
JCK> only additional requirement that introduces is that the 
JCK> little-piece development proposals must be examined against a 
JCK> "does this foul up anything else in the system" criterion.  And 
JCK> that is, IMO, one which we have too often managed to bypass.

and I agree with that, certainly.

we can, of course, see the great, big, looming trap that this permits us
to fall into, however. If we are not extremely careful.

And one could argue that we have fallen into it quite a few times in
recent years.

>> and 2) it
>> permits turning the crank on the specification engine more
>> quickly and more frequently. This means that we get
>> operational experience more quickly and can then, quickly,
>> refine the specifications to match actual field knowledge.
JCK> Again, up to a point.  I would have said "until the boundaries
JCK> of the most problematic parts are understood sufficiently to be 
JCK> reasonably confident that they really are isolated".

My point was that enforcing dependencies between working groups ensures
that none of them get their work done until the slowest is done.

I believe your point pertained to the abstraction of what is needed to
ensure that divide and conquer can proceed *without* the project
management (ie, timing) dependencies.

>> And it means that it is quite a
>> few years before there is any feedback from the user
>> community;.
JCK> I think one can argue that point into a "lose either way"
JCK> situation, so it is important to strike an appropriate balance.

yup.

JCK> From the other perspective, one could claim that the "smaller
JCK> pieces" approach --without adequate problem analysis-- can lead 
JCK> to quick and effective feedback from the user community about 
JCK> those specific pieces, but that their deployment may then make 
JCK> it essentially impossible to solve the more difficult problems 
JCK> by constraining possible solutions to them down to a null set.

the most essential bit of feedback is whether an approach (and its
specification) are useful.  If they are, the feedback means there is an
installed base, and it must be respected henceforth.  If the feedback
says 'not useful', the we get to start over.

Most of the time for any interesting problem, this carries a very real
requirement, , that we must deploy components of a larger solution prior
to fully understanding whether the divide-and-conquer component-ization
has been done well enough.

I believe it is a) the fear of getting this wrong, and/or b) actually
getting it wrong, that cause the massive delays we experience for these
complex problems.

>> Is IP a failure?
JCK> But IP, and the post-split TCP/IP, started from, I think, a 
JCK> fairly comprehensive understanding of what problems people were 
JCK> trying to solve.   Your example, along with the email 
JCK> transfer/content split, has more to do with effective 
JCK> modularization of the solution(s) than it does with a particular 
JCK> style of development.

With TCP/IP, perhaps.  With email, I believe not.

Frankly I would say that email was extremely accidental, in its
architecture. Yes, the FTP mail commands were planned from the start,
but standardizing the headers was done 5 years later, a separate email
transfer protocol done 5 years after that, and the effort to make the
body support multi-media content was done yet another 10 years later.

Email addressing was no better. Ray Tomlinson just stuck the @host onto
an existing structure. The architectural "brilliance" that came from
such incremental growth was that the rest of the net was not allowed to
know anything about the left-hand side (the mailbox) but only the domain
of control (the host). (And let me explain that I put the quotation
marks around brilliance not to diminish the importance, but to highlight
that this kind of excellent work is marked by doing less, not more.)

This left an extremely powerful back door for addressing extensions. By
contrast, X.400 suffered very seriously because it insisted that there
be global semantics to the whole string. (And -- surprise! -- we get
into trouble, today, when relays commit the sin of thinking they know
something about the syntax or semantics of the left-hand side.)

I haven't asked Ray about his approach to addressing, and believe he was
simply doing the minimum necessary to get the thing to work. I do not
believe he had any sort of grand, global scheme for the entire address.
My recollection is that the work on RFC733, in 1977, also did not get
more "sophisticated" with respect to the overall addressing model,
although we did add a feature for source-routing that proved useless.
And when RFC822 added domain name structure to the right-hand side, it
was again a small modification to a localized part of the architecture.

Clearly, MIME worked the same way.

So, none of this came from grand planning, at any point along the way,
in my opinion.

It did, however, benefit from some type of consistency in architectural
thinking at each incremental effort. Maybe. But I am hard-pressed to
claim it was even conscious or "coordinated".

JCK> Indeed, if one goes back and examines the 
JCK> history of the transfer/content split in email, a case can be 
JCK> made that, as with IP, the two started out fairly integrated and 
JCK> were split up after the advantages of that became clear.

I completely disagree.

The entire process was highly fragmented and asynchronous.

JCK> To the extent to which that is true, neither one is a good example
JCK> of _design_ or _development_ on a "small pieces" basis: rather,
JCK> they are examples of redefinition of the problem and
JCK> remodularization of the solution after the initial design and
JCK> development were complete.

Nope.  For email, definitely disagree.

>> In fact, this highlights one of the other, surprising benefits
>> of a divide-and-conquer approach, namely that it permits
>> post-hoc additions that were not planned.
JCK> Again, I suggest that the advantages you are claiming here --and
JCK> with which I agree-- are due to effective modularization, rather 
JCK> than a particular development approach (such as "divide and 
JCK> conquer").

I think that "effective modularization" and "divide and conquer" are the
same thing...

Hmmmm. I suspect I am using the term divide and conquer more loosely
than you. (That got me in trouble on my Ph.D. qualifying exam, and guess
what degree I did not get...)

JCK> And I believe we are more likely to get the 
JCK> modularization right if we have a moderately complete 
JCK> understanding of the problem --a systems analysis of properties, 
JCK> relationships, and interactions-- rather than if we start 
JCK> carving off pieces of the problem piecemeal on the assumption 
JCK> that we can always fit things back together later.

The bottom line is that I believe there needs to be good consideration
of the larger issues, too.

The question comes back to those matters of 'balance' and 'skill'.

So the real question is what can we do with these larger problem spaces
to facilitate small, rapid development of independently-useful
components that will fit into a larger scheme and support a reasonable
degree of system scaling?

(When we answer that, there are two activities we should pursue next.
One involves the purchase of a certain bridge into New York City and the
other involves world peace.)

d/
--
 Dave Crocker <mailto:dcrocker at brandenburg.com>
 Brandenburg InternetWorking <http://www.brandenburg.com>
 Sunnyvale, CA  USA <tel:+1.408.246.8253>, <fax:+1.866.358.5301>