Complex Problems (Was: Re: Discipline of Internet Protocol Engineering)

Wed Jul 2 10:01:00 CEST 2003

Dave,

I've needed to let this set of messages season a bit in the hope 
of saying something coherent and not repeating myself too much.

I think one needs to distinguish between "adequate understanding 
of the problem" and "overly broad attempts at a solution".  For 
the latter, we are, I think, pretty much in agreement: attempts 
to engineer large, integrated, complex solutions have rarely 
been successful in the IETF (or, perhaps, anywhere else).  But, 
when we take a complex _problem_ and say "well, we understand 
the lower-right-corner of it, so let's 'solve' that" often leads 
us into a solution that doesn't integrate well with the other 
pieces when those come along.  That poor integration can 
constrain opportunities and solutions going forward or can force 
us toward incompatible changes, neither of which is good.

Clearly, it is important to find a balance, since the quest for 
comprehensive and perfect understanding of every problem would 
result in our never actually getting anything done (if one were 
cynical, one might make that observation about this WG as well 
as about the IETF's technical/engineering work).  But "this 
piece will work, so let's do it" needs, IMO, to be carefully 
balanced with "and _what_ problem is it solving?".  And we need 
to have clear explanations and a reasonable consensus about the 
latter.   I suggest that failure to take that step has gotten us 
into a lot of trouble in some areas in the past and that may be 
where my views and Keith's intersect.

With that distinction between problem analysis and solutions, 
let's look at some of your examples...

--On Thursday, 26 June, 2003 13:41 -0700 Dave Crocker 
<dhc at dcrocker.net> wrote:

>...
> Solving a big, complicated problem by trying to develop a
> single, coherent specification -- no matter how many
> documents; the point is about making all the bits of work
> proceed in lock-step -- is a well-known way to ensure failure.
> One only needs to look at OSI in general, and X.400 in
> particular. (And by way of anticipating one of the rat-holes,
> I'll note that X.400 achieved the rare success of having too
> much be integrated all at once, but still lack core bits of
> functionality.)

And I would suggest that the failure in X.400 was that they lost 
track of the problem analysis (which was actually pretty good, 
IMO, in the early stages of that work) in the process of trying 
to accommodate too many different views and "solution" ideas. 
They also left two or three important issues out of that problem 
analysis: a reasonable migration path from earlier systems 
(including, later on, a reasonable migration path from earlier 
versions of X.400) and general usability (aggravated by the fact 
that there were already deployed systems that were easier to 
use).  A possible additional important issue was that we already 
had deployed and successful experience with more or less peer to 
peer email systems, while the analysis that led to X.400 assumed 
a highly-structured "email provider" environment.  I imagine 
that the missing "core bits of functionality" to which you refer 
fall into one of those areas, but might be additional ones in 
which the problem analysis failed to be adequate (and tracked 
into the final results).

By contrast, regardless of the development method, X.400 turned 
out to be fairly modular and composed of separable pieces, as 
evidenced by the number of times people have successfully used 
some pieces of the system and not others.   It is arguably 
better in that regard than SMTP, which assumes --in its handling 
of Received fields if nothing else-- some semblance of 822 
header/body structure in the messages that are being transported.

> The usual logic for explaining this problem is that the
> complexity of juggling all the issues, across the entire
> service, simply buries the development effort. At best, it
> ensures that the work is produced much, much later, often
> after the market has found an alternate solution. (And that
> is, most certainly, what happened to OSI. I can elaborate on
> this is some detail, if anyone really challenges this point.)

No disagreement on this.   But it is possible to do modular 
development against a systems-level problem understanding.  The 
only additional requirement that introduces is that the 
little-piece development proposals must be examined against a 
"does this foul up anything else in the system" criterion.  And 
that is, IMO, one which we have too often managed to bypass.

> Breaking down a complex problem into smaller pieces that are
> individually useful does two things that are quite good:
>
> 1) It permits each piece of work to be useful, even if some
> other piece of work has a persistent problem;

Yes, as long as those individual pieces don't prevent each other 
from working, or overconstrain parts of the problem for which 
pieces are not yet developed.

> and 2) it
> permits turning the crank on the specification engine more
> quickly and more frequently. This means that we get
> operational experience more quickly and can then, quickly,
> refine the specifications to match actual field knowledge.
> With large, integrated efforts, the inter-dependencies mean
> that the whole of the work is not useful until the most
> problematic part is completed.

Again, up to a point.  I would have said "until the boundaries 
of the most problematic parts are understood sufficiently to be 
reasonably confident that they really are isolated".   To draw 
from another recent thread, a _solution_ to the problematic 
parts is not necessary, but some understanding of them and there 
implications often is.

> And it means that it is quite a
> few years before there is any feedback from the user
> community;. If there was a problem with the major design
> decisions that were made, they become essentially impossible
> to change, because things took so long to delivery.

I think one can argue that point into a "lose either way" 
situation, so it is important to strike an appropriate balance. 
>From the other perspective, one could claim that the "smaller 
pieces" approach --without adequate problem analysis-- can lead 
to quick and effective feedback from the user community about 
those specific pieces, but that their deployment may then make 
it essentially impossible to solve the more difficult problems 
by constraining possible solutions to them down to a null set. 
That is reasonable if the more difficult problems are also the 
less important ones, but there are no guarantees that is the 
case.  Sometimes, the most difficult problems are also the most 
important.

>...
> Is IP a failure?  Besides independent development of TCP and
> UDP, from the IP core, ICMP has been able to proceed
> independently, as has different address-mapping efforts and,
> for that matter, address interpretation efforts.

But IP, and the post-split TCP/IP, started from, I think, a 
fairly comprehensive understanding of what problems people were 
trying to solve.   Your example, along with the email 
transfer/content split, has more to do with effective 
modularization of the solution(s) than it does with a particular 
style of development.  Indeed, if one goes back and examines the 
history of the transfer/content split in email, a case can be 
made that, as with IP, the two started out fairly integrated and 
were split up after the advantages of that became clear.   To 
the extent to which that is true, neither one is a good example 
of _design_ or _development_ on a "small pieces" basis: rather, 
they are examples of redefinition of the problem and 
remodularization of the solution after the initial design and 
development were complete.

>...
> (One could go down a rathole about nonconformance, noting how
> badly we suffer from excessive variance in email system
> behavior, but I claim that is due to lack of enforcement,
> rather than due to any architectural issues. Given the wide
> range of support for HTML, and the like, this might be an
> applications-level issue. But it is not a big-vs-small design
> effort issue.)

Agreed.

> In fact, this highlights one of the other, surprising benefits
> of a divide-and-conquer approach, namely that it permits
> post-hoc additions that were not planned.  When the original
> philosophy is to do as little as possible to be useful,
> knowing that more bits of related work will be done, then it
> often is easier to do some of those bits many years later.

Again, I suggest that the advantages you are claiming here --and 
with which I agree-- are due to effective modularization, rather 
than a particular development approach (such as "divide and 
conquer").  And I believe we are more likely to get the 
modularization right if we have a moderately complete 
understanding of the problem --a systems analysis of properties, 
relationships, and interactions-- rather than if we start 
carving off pieces of the problem piecemeal on the assumption 
that we can always fit things back together later.

regards,
    john