How we decide that we have decided (was: Re: Sampling)

Wed Jul 30 20:41:37 CEST 2003

Dave,

I don't want to put words into your mouth, or that of any of the 
IESG, so I'm going to try to go up a half-level of abstraction...

(1) It had better be possible to talk about, and question, the 
quality of a claimed consensus.  "Quality" must be able to 
include bias* -- in the narrow statistical sense, not, in 
itself, implying that anything evil is happening -- of whatever 
convenience samples* happen to have been surveyed relative to 
the whole population of participants in some WG, the (usually 
larger) population of people interested in the WG but not 
participating for some reason, the (usually larger) population 
of IETF participants, and even the (larger yet) population of 
(to use ANSI's term) "materially concerned" Internet users and 
suppliers.

If we can't do that, then we have absolutely no protection 
against some individual or group generating so much noise (or 
hostility, or other unpleasantnesses) that they drive everyone 
who disagrees with them away and then claim consensus for their 
position.  Our protection against that sort of attack is, 
primarily, the judgment of WG Chairs, Area Directors, and the 
whole IESG, backed up, if necessary, by appeals processes to 
force additional scrutiny on what occurred.  Those judgments 
are, ultimately, extremely subjective, which is too bad, but 
there is no other way.  In particular, we will not get any 
protection from discussions of sampling theory, since we don't 
have good models or descriptions of either the convenience* or 
self-selection* samples involved or of the underlying presumed 
populations.

In addition, our oft-repeated claim that we rely more on mailing 
list discussions and conclusions than on the group of people who 
happen to show up and a meeting and talk or wave their arms 
becomes completely bogus if we can't evaluate the quality of a 
claimed consensus.  I hope we are not headed in that direction.

(2) Why has there been little or no response other than from a 
few IESG members?    I suggest that every participant in the 
IETF, including the IESG and those few for whom IETF 
participation is a full-time job, must regularly, perhaps 
continually, make decisions about what is important and what 
isn't.   The criteria for importance will differ from one or 
another of us, but might rationally include

	- "will my putting energy into this make any difference
	at the end of the day" and/or

	- "will this effort make any real difference when
	examined from the perspective of a few years from now",
	and/or

	- "is my following this and contributing to it likely to
	be of enough value to overcome the irritation costs and
	increased blood pressure it will probably involve",

as well as the traditional opportunity cost question of

	"if I don't spend time on this, what else could I be
	doing that would be more useful or productive?".

In my own case, my original use of the term "bias" was, I 
believe, statistically correct.  It didn't take very many 
transactions on this thread for me to conclude that, given the 
personal criteria I'm applying these days, getting drawn into a 
discussion about sampling terminology and theory and its 
relationship to IETF meetings was not an investment I felt like 
making.  So I have been sampling* the thread**, but not 
responding.  I changed that policy/ strategy (and changed the 
subject line) when the thread turned into what appear to be a 
set of assertions about agreements/ conclusions/ consensus that 
do and do not exist based on other responses and where they come 
from.

(3) Once a WG, or set of discussion threads, starts being taken 
over by a relatively small number of passionate people who are 
generating a lot of often-repetitive postings, we need to be, 
IMO, _extremely_ careful about how we interpret whatever polls 
and consensus calls we make and even more careful about traffic 
analysis.  It is very easy to confuse "consensus among those who 
haven't given up on the effort" and "consensus among the group 
who are concerned about the issues".  Those two groups are often 
different and the reasons for the differences may be very 
important.  They are also often different from "the group who 
happens to be gathered together in a given place" or "the group 
who showed up at a meeting (or plenary)" or "the group that is 
actively participating in a WG at a particular time". 
Analysis, whether by statistical or other means, of those 
differences may be quite instructive but, normally, only if it 
moves beyond repeated assertions of personal belief (or about 
beliefs or claims about what others mean by their silence).

(4) In the absence of analysis good enough to project from a 
subset (whether it is strictly a sample or not and whether that 
sample is biased or not) of some population to the opinions of 
that population, just about the only thing one can safely do is 
to accept the opinions (or measurements) as expressed, while 
being cautious about their interpretation based on an 
understanding of threats to validity*.

(5) Let me conclude with a personally-troubling example of how 
some of this works.  Melinda has sent out two messages to try to 
validate the meeting "hums".  That is a legitimate thing for a 
WG Chair to do in the process of trying to determine consensus. 
I've read both messages, thought about responding on the several 
issues in which my personal opinions are different from the 
consensus she has identified, and decided to not do so.  Why? 
Because in my personal opinion, this WG has been extremely 
helpful in discussing and focusing on the issues, but has now 
passed sufficiently far into the range of diminishing returns to 
have outlived its usefulness.   If I were the AD, I'd be trying 
to deliver a "wrap this up before I shut it down; it shouldn't 
drag out much longer" message.

I've said that before in different ways, including at the 
plenary where, independent of how one interprets it, my "lets 
get on with it and do so without setting up more apparatus" 
comments appeared to get more support than most of the specifics 
from this WG or the process of which it is a part.  A corollary 
of that personal conclusion of mine is that it would be useful 
to publish the issues list, as it exists today, as a snapshot of 
some very serious and thoughtful community analysis, but without 
either worrying a great deal about whether it represents 
consensus (and of what) or about getting every statement exactly 
right.  Were we to follow that path, the one thing that would be 
important to get right in the document would be a clear 
statement about what it does, and does not, represent.

So, just as I challenged Harald at the plenary about the 
questions he did (or didn't) ask, I should probably challenge 
Melissa to ask for input on who actually thinks the charge of 
this WG is worth more time and, if so, how much more and what we 
should use as a stopping rule.  The latter is especially 
important given the subject matter of the WG: it would be _lots_ 
better if we reached consensus to shut ourselves down than if we 
wait long enough for it to be obvious that the AD should do it.

Is it beneficial to the WG, or to the IETF, for me to repeat 
those conclusions three or four --or a few dozen-- more times? 
I doubt it.  I think I've been heard and that those who are 
going to agree have agreed already and that those who haven't 
won't change their minds if I repeat myself.   So I'm trying to 
move on, even if the WG is not.   What is the relationship 
between those conclusions and Melissa's "hum closure" calls? 
Well, I don't agree with several of her conclusions and, 
especially since there has mostly been silence, she is on thin 
ice if she concludes that everyone who hasn't expressed 
disagreement agrees with her.    But I don't disagree enough 
--relative to my assessment of importance and what else I could 
be doing with my time-- to want to stand up and say "I disagree, 
and here is my analysis and my reasons".  Bad investment of 
time: I personally believe that the document today is close 
enough for any real, viewed-from-five-years-hence purpose which 
it is likely to leverage and I think we should get on with it.

      john

Footnotes:

* The term "sample" and "sampling" have very precise meanings in 
statistics.  They involve systematic mechanisms for determining 
or describing the relationship between the sample and some 
population or universe that it is supposed to represent, so that 
inferences from the sample can be projected (or translated) into 
inferences about the population.  If it is not possible to 
describe either the population or the relationship of the 
"sampled" subpopulation to it, then "sample" belongs in quotes. 
The term "convenience sample" refers to a subpopulation that is 
selected, not by some formal or systematic means, but on the 
basis of, e.g., whomever happens to be handy or most easily 
accessed.  The term "self-selected sample" refers to what is 
usually considered a special sort of convenience sample in which 
the members of the selected subpopulation select themselves, 
often based on their particular positions on whatever is being 
measured.  Among the people who do sample design for a living, 
both "convenience sample" and "self-selected sample" are terms 
of abuse or, sometimes, derision.

"Bias", technically, refers to the difference, or lack thereof, 
between a sample (selected under some sampling model) and the 
relevant population.  If the population parameters, other than 
size, are different from the equivalent parameters of the 
presumed sample, then the sample is biased and --and this is the 
important issue-- one cannot safely make direct inferences from 
findings or conclusions about the sample to findings or 
inferences of the population.  Convenience samples can be 
unbiased, but that doesn't happen by accident and, in 
statistical sampling circles, the burden of proof is on whomever 
makes that claim and the burden is pretty high.  And, finally, 
"threat to validity" is another technical term, referring to the 
whole collection of things (hidden systematic bias is only one) 
that can make projection of an inference from a sample onto a 
population incorrect, even if "sample statistics", "confidence 
intervals", etc., indicate that things should be ok.

** Yes, I have a sampling model for the list that determines 
what I read and what I don't.  I don't feel like describing it, 
partially because my doing so might possibly introduce 
additional biases.

Finally, an observation on the footnotes:  By most reasonable 
definitions of the term, I'm a card-carrying statistician or at 
least a card-carrying data analyst (but I am not a sampling 
expert and have never had any desire to be one).  It has been a 
while, but I've taught this stuff and made a day-job living 
doing some of it.  And I find it more than mildly interesting 
that the other folks in the IETF whom I'm aware of having 
similar backgrounds have, unlike myself, been smart enough to 
not waste their time speaking up on this issue.

----------

--On Wednesday, 30 July, 2003 15:00 -0700 Dave Crocker 
<dhc at dcrocker.net> wrote:

> Folks,
>
>>> I believe that both the working group participation and the
>>> IETF Vienna plenary presentation represent very, very highly
>>> biased samples of the IETF population.
>
> HTA> Yes. In both cases, they consist of the people that show
> up. and
> RB> all samples are biased.  we just don't like the ones that
> have RB> results which disagree with our own perceptions.
> funny monkeys RB> we are.
>
> Two points from this brief thread:
>
>
> 1. Two members of the IESG are clear that we need not be
> concerned with the basis on which we interpret a meeting's
> results.
>
> Literally the only thing that matters is who shows up.
>
> In social research methodology discussions about the topic of
> sampling bias, there is a careful distinction between
> "population" and "sample". There are some useful decades of
> experience with this distinction.  The same useful history
> concerns biasing aspects of the ways questions are asked.
>
> From the two postings on this thread, it appears that we are
> simply to treat sample and population as the same, and we need
> not worry about the particular form of questions to the group.
>
> Originally, the IETF worried quite a bit about being
> inclusive, in its formulation of meeting logistics, its
> assessment and use of consensus statements at meetings, and
> its general attention to the preferences of the general
> community. All of this is difficult and unpleasant.
>
> I have always felt that that careful attention to real
> inclusiveness was a primary source of legitimacy for the IETF.
> It was in this context, of bending over backwards to be
> inclusive, that the focus on those who showed up and
> participated rang true.
>
> Apparently things have changed. It appears that we no longer
> have to worry whether meeting logistics serve to exclude
> people or whether IETF process serves to disenfranchise folks.
> We just tally who shows up -- or rather, who speaks up -- and
> that's that.
>
> I'm guessing that some folks might disagree with my assessment.
> Clarifications and corrections would be greatly appreciated.
>
>
> 2. No one else has contributed to this thread.
>
> Hence they have not "shown up". By definition this means that
> the two IESG members are correct and that I am wrong about my
> concerns.
>
> Luckily, we do not need to worry whether folks have, in fact,
> given up on this forum or this topic. They haven't
> contributed, so their views are not relevant.
>
> d/
> --
>  Dave Crocker <mailto:dcrocker at brandenburg.com>
>  Brandenburg InternetWorking <http://www.brandenburg.com>
>  Sunnyvale, CA  USA <tel:+1.408.246.8253>,
> <fax:+1.866.358.5301>