[R-C] bufferbloat-induced delay at a non-bottleneck node

Wed Oct 12 07:12:39 CEST 2011

Jim:  We're moving this discussion to the newly-created mailing sub-list -
    Rtp-congestion at alvestrand.no
    http://www.alvestrand.no/mailman/listinfo/rtp-congestion

If you'd like to continue this discussion (and I'd love you to do so), 
please join the mailing list.  (Patrick, you may want to join too and 
read the very small backlog of messages (perhaps 10 so far)).

On 10/11/2011 4:17 PM, Jim Gettys wrote:
> On 10/11/2011 03:11 AM, Henrik Lundin wrote:
>>
>>
>> I do not agree with you here. When an over-use is detected, we propose
>> to measure the /actual/ throughput (over the last 1 second), and set
>> the target bitrate to beta times this throughput. Since the measured
>> throughput is a rate that evidently was feasible (at least during that
>> 1 second), any beta<  1 should assert that the buffers get drained,
>> but of course at different rates depending on the magnitude of beta.
> Take a look at the data from the ICSI netalyzr: you'll find scatter
> plots at:
>
> http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/
>
> Note the different coloured lines.  They represent the amount of
> buffering measured in the broadband edge in *seconds*.  Also note that
> for various reasons, the netalyzr data is actually likely
> underestimating the problem.

Understood.  Though that's not entirely relevant to this problem, since 
the congestion-control mechanisms we're using/designing here are 
primarily buffer-sensing algorithms that attempt to keep the buffers in 
a drained state.  If there's no competing traffic at the bottleneck, 
they're likely to do so fairly well, though more simulation and 
real-world tests are needed.  I'll note that several organizations 
(Google/GIPS, Radvision and my old company WorldGate) had found that 
these types of congestion-control algorithms are quite effective in 
practice.

However, it isn't irrelevant to the problem either:

This class of congestion-control algorithms are subject to "losing" if 
faced with a sustained high-bandwidth TCP flow like some of your tests, 
since they back off when TCP isn't seeing any restriction (loss) yet. 
Eventually TCP will fill the buffers.

More importantly, perhaps, bufferbloat combined with the high 'burst' 
nature of browser network systems (and websites) optimizing for 
page-load time means you can get a burst of data at a congestion point 
that isn't normally the bottleneck.

The basic scenario goes like this:

1. established UDP flow near bottleneck limit at far-end upstream>
2. near-end browser (or browser on another machine in the same house)
    initiates a page-load
3. near-end browser opens "many" tcp connections to the site and
    other sites that serve pieces (ads, images, etc) of the page.
4. Rush of response data saturates the downstream link to the
    near-end, which was not previously the bottleneck.  Due to
    bufferbloat, this can cause a significant amount of data to be
    temporarily buffered, delaying competing UDP data significantly
    (tenths of a second, perhaps >1 second in cases).  This is hard
    to model accurately; real-world tests are important.
5. Congestion-control algorithm notices transition to buffer-
    induced delay, and tells the far side to back off.  The latency
    of this decision may help us avoid over-reacting, as we have to
    see increasing delay which takes a number of packets (at least
    1/10 second, and easily could be more).  Also, the result of
    the above "inrush"/pageload-induced latency may not trigger the
    congestion mechanisms we discuss here, as we might see a BIG jump
    in delay followed by steady delay or a ramp down (since if the
    buffer has suddenly jumped from drained to full, all it can do is
    be stable or drain).

Note that Google's current algorithm (which you comment on above) uses 
recent history for choosing the reduction; in this case it's hard to say 
what the result would be: if it invokes the backoff at the start of the 
pageload, then the bandwidth received recently is the current bandwidth, 
so the new bandwidth is current minus small_delta.  If it happens after 
data has queued behind the burst of TCP traffic, then when the backoff 
is generated we'll have gotten almost no data through "recently" and we 
may back off all the way to min bandwidth; an over-reaction, depending 
on the time constant and level of how fast that burst can fill the 
downstream buffers.

Now, in practice this is likely messier and the pageload doesn't 
generate a huge sudden block of data that fills the buffers, so there's 
some upward slope to delay as you head to saturation of the downstream 
buffers.  And there's very little you can do about this - and backing 
off a lot may help in that the less data you put onto the end of this 
overloaded queue (assuming the pageload flow has ended or soon will), 
the sooner the queue will drain and low-latency will be re-established.

Does the ICSI data call out *where* the buffer-bloat occurs?

> Then realise that when congested, nothing you do can react faster than
> the RTT including the buffering.
>
> So if your congestion is in the broadband edge (where it often/usually
> is), you are in a world of hurt, and you can't use any algorithm that
> has fixed time constants, even one as long as 1 second.
>
> Wish this weren't so, but it is.
>
> Bufferbloat is a disaster...

Given the loss-based algorithms for TCP/etc, yes.  We have to figure out 
how to (as reliably *as possible*) deliver low-latency data in this 
environment.

-- 
Randell Jesup
randell-ietf at jesup.org