Frameworks for analysis: quality control and resource availability

Tue Jan 28 09:54:46 CET 2003

While I'm very interested in the throughput and delay statistics that are
being collected, I would like to suggest that there may be another set of
metrics which we might want to be looking at relating to quality/rework as
well as resource availability.

One of the tenets of quality control is that poor quality results in
rework which is both costly and limits the throughput of the system. If
quality is bad enough, the system can expend a large portion of resources
on rework, and throughput declines dramatically; this is the analog of
"congestive collapse".

Systems which do their quality checks toward the end of the process,
rather than along the way tend to have rework problems in spades. Imagine
a software development process where little or no testing is done until
the day before the product is to ship; or an auto manufacturing plant that
does not do any quality checks on components or assemblies before
(attempting to) drive the car off the assembly line.

Most of us would not want to purchase products manufactured this way, but
this seems like a reasonable accurate description of the IETF process. I
would like to suggest that we attempt to measure rework. Some metrics come
to mind, but of course they are by no means perfect:

* Delay between IETF last call and IESG approval
* Number of draft revisions betwen IETF last call and RFC publication

In terms of resource contraints, I would like to suggest that the IESG may
not be the true bottleneck of the process -- it may actually reside in the
availability of individuals technically qualified to supervise the work --
the "team" put in place to manage the WG. This not only includes the
chairs, but also the seurity advisors, document editors, and sponsoring
directorate members.

If true, the solution to this would *not* be to attempt to get those
individuals to "do more work" -- but rather to determine whether the
resources exist for success before chartering a WG in the first place.
The goal of this "resource assessment" would be to determine the
likelihood of rework and put in place the resources to increase the
likelihood of success.

Long ago in a former life, I was involved in studies of engineering
processes and in particular, looking at the factors that resulted in
projects going way over budget and being delivered very late. Within a
restricted class of projects (e.g. chemical plants) it was often possible
to develop metrics that would predict rework. The more upfront work was
done on the design upront, the better the delivery metrics turned out to
be.

Often at the time a WG is chartered, the major hurdles can be anticipated.
Perhaps there are some major security issues; or transport problems that
are known to be hard. It would seem more than reasonable to require the WG
to sign up appropriate security and transport advisors who would commit
the time for early reviews to ensure quality.

I'd also suggest that the time to do early review may be before documents
officially enter the system: at the time they become WG work items. If the
resources are not available to complete the work successfully, or if the
document does not meet basic quality standards, then we would exercise
admission control on the edge.

In summary, I would suggest that there may be value in looking at the IETF quality
control process, the metrics for quality, and in particular, how those
quality checks might be front-loaded.

-----------------------------------------------------------------------
On December 11, 2002, Harald said:

After a month or so of debate since Atlanta, we might actually be closer
to crystallizing out a few core problems than we were then, even though it
might not seem so from the number of fixes suggested....

this note tries to take the high level view (100.000 foot is about 10
times the ordinary "10.000 foot view", so this is VERY high....)

1) The standards activity in the WGs could function better.
   There are many causes for them not doing so, including:
   - Polarization because of company positions
   - Polarization because of human nature
   - Inability to stick to promises to do work
   - Inability to write readable technical documents
   - Lack of clue of WG chairs in how to guide consensus
   - Lack of architectural insight
   and so on and so forth.
   Most of the solutions suggested are of the form
   "clueful people must do more work".

2) The IESG does not have capacity to do more work.
   In fact, it is struggling with handling its current regime, and has a
   serious lack of breath left over for getting the 10.000 foot
   perspective back and actually changing the ways things are done, let
   alone catching up with the stuff that's been dropped on the floor over
   time.

   This, again, has many causes, most of them more related to scaling than
   anything else.
   The solution set seems to evolve around:
   - Get someone else to do part of the work
     (farm out policy-setting, document review, WG management....)
   - Institute rules that are simpler to manage to than current rules
   - Reduce the size of the IETF

Of course, the fact that "clueful people" in the first problem's solution
often is shortcircuited into "the IESG" in the second problem means that
the two problems are in destructive interference with each other - they
make each other worse.

Does this rough bipartition of the problem space ring bells in people's
minds?

                      Harald