How we decide that we have decided (was: Re: Sampling)
John C Klensin
john-ietf at jck.com
Wed Jul 30 20:41:37 CEST 2003
Dave,
I don't want to put words into your mouth, or that of any of the
IESG, so I'm going to try to go up a half-level of abstraction...
(1) It had better be possible to talk about, and question, the
quality of a claimed consensus. "Quality" must be able to
include bias* -- in the narrow statistical sense, not, in
itself, implying that anything evil is happening -- of whatever
convenience samples* happen to have been surveyed relative to
the whole population of participants in some WG, the (usually
larger) population of people interested in the WG but not
participating for some reason, the (usually larger) population
of IETF participants, and even the (larger yet) population of
(to use ANSI's term) "materially concerned" Internet users and
suppliers.
If we can't do that, then we have absolutely no protection
against some individual or group generating so much noise (or
hostility, or other unpleasantnesses) that they drive everyone
who disagrees with them away and then claim consensus for their
position. Our protection against that sort of attack is,
primarily, the judgment of WG Chairs, Area Directors, and the
whole IESG, backed up, if necessary, by appeals processes to
force additional scrutiny on what occurred. Those judgments
are, ultimately, extremely subjective, which is too bad, but
there is no other way. In particular, we will not get any
protection from discussions of sampling theory, since we don't
have good models or descriptions of either the convenience* or
self-selection* samples involved or of the underlying presumed
populations.
In addition, our oft-repeated claim that we rely more on mailing
list discussions and conclusions than on the group of people who
happen to show up and a meeting and talk or wave their arms
becomes completely bogus if we can't evaluate the quality of a
claimed consensus. I hope we are not headed in that direction.
(2) Why has there been little or no response other than from a
few IESG members? I suggest that every participant in the
IETF, including the IESG and those few for whom IETF
participation is a full-time job, must regularly, perhaps
continually, make decisions about what is important and what
isn't. The criteria for importance will differ from one or
another of us, but might rationally include
- "will my putting energy into this make any difference
at the end of the day" and/or
- "will this effort make any real difference when
examined from the perspective of a few years from now",
and/or
- "is my following this and contributing to it likely to
be of enough value to overcome the irritation costs and
increased blood pressure it will probably involve",
as well as the traditional opportunity cost question of
"if I don't spend time on this, what else could I be
doing that would be more useful or productive?".
In my own case, my original use of the term "bias" was, I
believe, statistically correct. It didn't take very many
transactions on this thread for me to conclude that, given the
personal criteria I'm applying these days, getting drawn into a
discussion about sampling terminology and theory and its
relationship to IETF meetings was not an investment I felt like
making. So I have been sampling* the thread**, but not
responding. I changed that policy/ strategy (and changed the
subject line) when the thread turned into what appear to be a
set of assertions about agreements/ conclusions/ consensus that
do and do not exist based on other responses and where they come
from.
(3) Once a WG, or set of discussion threads, starts being taken
over by a relatively small number of passionate people who are
generating a lot of often-repetitive postings, we need to be,
IMO, _extremely_ careful about how we interpret whatever polls
and consensus calls we make and even more careful about traffic
analysis. It is very easy to confuse "consensus among those who
haven't given up on the effort" and "consensus among the group
who are concerned about the issues". Those two groups are often
different and the reasons for the differences may be very
important. They are also often different from "the group who
happens to be gathered together in a given place" or "the group
who showed up at a meeting (or plenary)" or "the group that is
actively participating in a WG at a particular time".
Analysis, whether by statistical or other means, of those
differences may be quite instructive but, normally, only if it
moves beyond repeated assertions of personal belief (or about
beliefs or claims about what others mean by their silence).
(4) In the absence of analysis good enough to project from a
subset (whether it is strictly a sample or not and whether that
sample is biased or not) of some population to the opinions of
that population, just about the only thing one can safely do is
to accept the opinions (or measurements) as expressed, while
being cautious about their interpretation based on an
understanding of threats to validity*.
(5) Let me conclude with a personally-troubling example of how
some of this works. Melinda has sent out two messages to try to
validate the meeting "hums". That is a legitimate thing for a
WG Chair to do in the process of trying to determine consensus.
I've read both messages, thought about responding on the several
issues in which my personal opinions are different from the
consensus she has identified, and decided to not do so. Why?
Because in my personal opinion, this WG has been extremely
helpful in discussing and focusing on the issues, but has now
passed sufficiently far into the range of diminishing returns to
have outlived its usefulness. If I were the AD, I'd be trying
to deliver a "wrap this up before I shut it down; it shouldn't
drag out much longer" message.
I've said that before in different ways, including at the
plenary where, independent of how one interprets it, my "lets
get on with it and do so without setting up more apparatus"
comments appeared to get more support than most of the specifics
from this WG or the process of which it is a part. A corollary
of that personal conclusion of mine is that it would be useful
to publish the issues list, as it exists today, as a snapshot of
some very serious and thoughtful community analysis, but without
either worrying a great deal about whether it represents
consensus (and of what) or about getting every statement exactly
right. Were we to follow that path, the one thing that would be
important to get right in the document would be a clear
statement about what it does, and does not, represent.
So, just as I challenged Harald at the plenary about the
questions he did (or didn't) ask, I should probably challenge
Melissa to ask for input on who actually thinks the charge of
this WG is worth more time and, if so, how much more and what we
should use as a stopping rule. The latter is especially
important given the subject matter of the WG: it would be _lots_
better if we reached consensus to shut ourselves down than if we
wait long enough for it to be obvious that the AD should do it.
Is it beneficial to the WG, or to the IETF, for me to repeat
those conclusions three or four --or a few dozen-- more times?
I doubt it. I think I've been heard and that those who are
going to agree have agreed already and that those who haven't
won't change their minds if I repeat myself. So I'm trying to
move on, even if the WG is not. What is the relationship
between those conclusions and Melissa's "hum closure" calls?
Well, I don't agree with several of her conclusions and,
especially since there has mostly been silence, she is on thin
ice if she concludes that everyone who hasn't expressed
disagreement agrees with her. But I don't disagree enough
--relative to my assessment of importance and what else I could
be doing with my time-- to want to stand up and say "I disagree,
and here is my analysis and my reasons". Bad investment of
time: I personally believe that the document today is close
enough for any real, viewed-from-five-years-hence purpose which
it is likely to leverage and I think we should get on with it.
john
Footnotes:
* The term "sample" and "sampling" have very precise meanings in
statistics. They involve systematic mechanisms for determining
or describing the relationship between the sample and some
population or universe that it is supposed to represent, so that
inferences from the sample can be projected (or translated) into
inferences about the population. If it is not possible to
describe either the population or the relationship of the
"sampled" subpopulation to it, then "sample" belongs in quotes.
The term "convenience sample" refers to a subpopulation that is
selected, not by some formal or systematic means, but on the
basis of, e.g., whomever happens to be handy or most easily
accessed. The term "self-selected sample" refers to what is
usually considered a special sort of convenience sample in which
the members of the selected subpopulation select themselves,
often based on their particular positions on whatever is being
measured. Among the people who do sample design for a living,
both "convenience sample" and "self-selected sample" are terms
of abuse or, sometimes, derision.
"Bias", technically, refers to the difference, or lack thereof,
between a sample (selected under some sampling model) and the
relevant population. If the population parameters, other than
size, are different from the equivalent parameters of the
presumed sample, then the sample is biased and --and this is the
important issue-- one cannot safely make direct inferences from
findings or conclusions about the sample to findings or
inferences of the population. Convenience samples can be
unbiased, but that doesn't happen by accident and, in
statistical sampling circles, the burden of proof is on whomever
makes that claim and the burden is pretty high. And, finally,
"threat to validity" is another technical term, referring to the
whole collection of things (hidden systematic bias is only one)
that can make projection of an inference from a sample onto a
population incorrect, even if "sample statistics", "confidence
intervals", etc., indicate that things should be ok.
** Yes, I have a sampling model for the list that determines
what I read and what I don't. I don't feel like describing it,
partially because my doing so might possibly introduce
additional biases.
Finally, an observation on the footnotes: By most reasonable
definitions of the term, I'm a card-carrying statistician or at
least a card-carrying data analyst (but I am not a sampling
expert and have never had any desire to be one). It has been a
while, but I've taught this stuff and made a day-job living
doing some of it. And I find it more than mildly interesting
that the other folks in the IETF whom I'm aware of having
similar backgrounds have, unlike myself, been smart enough to
not waste their time speaking up on this issue.
----------
--On Wednesday, 30 July, 2003 15:00 -0700 Dave Crocker
<dhc at dcrocker.net> wrote:
> Folks,
>
>>> I believe that both the working group participation and the
>>> IETF Vienna plenary presentation represent very, very highly
>>> biased samples of the IETF population.
>
> HTA> Yes. In both cases, they consist of the people that show
> up. and
> RB> all samples are biased. we just don't like the ones that
> have RB> results which disagree with our own perceptions.
> funny monkeys RB> we are.
>
> Two points from this brief thread:
>
>
> 1. Two members of the IESG are clear that we need not be
> concerned with the basis on which we interpret a meeting's
> results.
>
> Literally the only thing that matters is who shows up.
>
> In social research methodology discussions about the topic of
> sampling bias, there is a careful distinction between
> "population" and "sample". There are some useful decades of
> experience with this distinction. The same useful history
> concerns biasing aspects of the ways questions are asked.
>
> From the two postings on this thread, it appears that we are
> simply to treat sample and population as the same, and we need
> not worry about the particular form of questions to the group.
>
> Originally, the IETF worried quite a bit about being
> inclusive, in its formulation of meeting logistics, its
> assessment and use of consensus statements at meetings, and
> its general attention to the preferences of the general
> community. All of this is difficult and unpleasant.
>
> I have always felt that that careful attention to real
> inclusiveness was a primary source of legitimacy for the IETF.
> It was in this context, of bending over backwards to be
> inclusive, that the focus on those who showed up and
> participated rang true.
>
> Apparently things have changed. It appears that we no longer
> have to worry whether meeting logistics serve to exclude
> people or whether IETF process serves to disenfranchise folks.
> We just tally who shows up -- or rather, who speaks up -- and
> that's that.
>
> I'm guessing that some folks might disagree with my assessment.
> Clarifications and corrections would be greatly appreciated.
>
>
> 2. No one else has contributed to this thread.
>
> Hence they have not "shown up". By definition this means that
> the two IESG members are correct and that I am wrong about my
> concerns.
>
> Luckily, we do not need to worry whether folks have, in fact,
> given up on this forum or this topic. They haven't
> contributed, so their views are not relevant.
>
> d/
> --
> Dave Crocker <mailto:dcrocker at brandenburg.com>
> Brandenburg InternetWorking <http://www.brandenburg.com>
> Sunnyvale, CA USA <tel:+1.408.246.8253>,
> <fax:+1.866.358.5301>
More information about the Problem-statement
mailing list