[R-C] bufferbloat-induced delay at a non-bottleneck node

Wed Oct 12 07:48:10 CEST 2011

Bufferbloat is a drag, indeed. The way I see it, a delay-sensing CC will *
have* to lose the battle against TCP flows over "bufferbloated" devices, at
least in the steady-state case (i.e., persistent TCP flows). Otherwise we
will have created an algorithm that may just as well fill up the buffers for
itself, even in a situation without competing flows. I'm pessimistic that
there is any other way around this than QoS.

For the transient case that you are describing here, I think that we still
cannot do much about the actual filling of the buffers; we are not to blame
for it. But as you point out, we can probably do some more about how to
respond when the transient cross-traffic hits. I'm not sure how, though.

/Henrik

On Wed, Oct 12, 2011 at 7:12 AM, Randell Jesup <randell-ietf at jesup.org>wrote:

> Jim:  We're moving this discussion to the newly-created mailing sub-list -
>   Rtp-congestion at alvestrand.no
>   http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>
>
> If you'd like to continue this discussion (and I'd love you to do so),
> please join the mailing list.  (Patrick, you may want to join too and read
> the very small backlog of messages (perhaps 10 so far)).
>
> On 10/11/2011 4:17 PM, Jim Gettys wrote:
>
>> On 10/11/2011 03:11 AM, Henrik Lundin wrote:
>>
>>>
>>>
>>> I do not agree with you here. When an over-use is detected, we propose
>>> to measure the /actual/ throughput (over the last 1 second), and set
>>> the target bitrate to beta times this throughput. Since the measured
>>> throughput is a rate that evidently was feasible (at least during that
>>> 1 second), any beta<  1 should assert that the buffers get drained,
>>> but of course at different rates depending on the magnitude of beta.
>>>
>> Take a look at the data from the ICSI netalyzr: you'll find scatter
>> plots at:
>>
>> http://gettys.wordpress.com/**2010/12/06/whose-house-is-of-**
>> glasse-must-not-throw-stones-**at-another/<http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/>
>>
>> Note the different coloured lines.  They represent the amount of
>> buffering measured in the broadband edge in *seconds*.  Also note that
>> for various reasons, the netalyzr data is actually likely
>> underestimating the problem.
>>
>
> Understood.  Though that's not entirely relevant to this problem, since the
> congestion-control mechanisms we're using/designing here are primarily
> buffer-sensing algorithms that attempt to keep the buffers in a drained
> state.  If there's no competing traffic at the bottleneck, they're likely to
> do so fairly well, though more simulation and real-world tests are needed.
>  I'll note that several organizations (Google/GIPS, Radvision and my old
> company WorldGate) had found that these types of congestion-control
> algorithms are quite effective in practice.
>
> However, it isn't irrelevant to the problem either:
>
> This class of congestion-control algorithms are subject to "losing" if
> faced with a sustained high-bandwidth TCP flow like some of your tests,
> since they back off when TCP isn't seeing any restriction (loss) yet.
> Eventually TCP will fill the buffers.
>
> More importantly, perhaps, bufferbloat combined with the high 'burst'
> nature of browser network systems (and websites) optimizing for page-load
> time means you can get a burst of data at a congestion point that isn't
> normally the bottleneck.
>
> The basic scenario goes like this:
>
> 1. established UDP flow near bottleneck limit at far-end upstream>
> 2. near-end browser (or browser on another machine in the same house)
>   initiates a page-load
> 3. near-end browser opens "many" tcp connections to the site and
>   other sites that serve pieces (ads, images, etc) of the page.
> 4. Rush of response data saturates the downstream link to the
>   near-end, which was not previously the bottleneck.  Due to
>   bufferbloat, this can cause a significant amount of data to be
>   temporarily buffered, delaying competing UDP data significantly
>   (tenths of a second, perhaps >1 second in cases).  This is hard
>   to model accurately; real-world tests are important.
> 5. Congestion-control algorithm notices transition to buffer-
>   induced delay, and tells the far side to back off.  The latency
>   of this decision may help us avoid over-reacting, as we have to
>   see increasing delay which takes a number of packets (at least
>   1/10 second, and easily could be more).  Also, the result of
>   the above "inrush"/pageload-induced latency may not trigger the
>   congestion mechanisms we discuss here, as we might see a BIG jump
>   in delay followed by steady delay or a ramp down (since if the
>   buffer has suddenly jumped from drained to full, all it can do is
>   be stable or drain).
>
> Note that Google's current algorithm (which you comment on above) uses
> recent history for choosing the reduction; in this case it's hard to say
> what the result would be: if it invokes the backoff at the start of the
> pageload, then the bandwidth received recently is the current bandwidth, so
> the new bandwidth is current minus small_delta.  If it happens after data
> has queued behind the burst of TCP traffic, then when the backoff is
> generated we'll have gotten almost no data through "recently" and we may
> back off all the way to min bandwidth; an over-reaction, depending on the
> time constant and level of how fast that burst can fill the downstream
> buffers.
>
> Now, in practice this is likely messier and the pageload doesn't generate a
> huge sudden block of data that fills the buffers, so there's some upward
> slope to delay as you head to saturation of the downstream buffers.  And
> there's very little you can do about this - and backing off a lot may help
> in that the less data you put onto the end of this overloaded queue
> (assuming the pageload flow has ended or soon will), the sooner the queue
> will drain and low-latency will be re-established.
>
> Does the ICSI data call out *where* the buffer-bloat occurs?
>
>  Then realise that when congested, nothing you do can react faster than
>> the RTT including the buffering.
>>
>> So if your congestion is in the broadband edge (where it often/usually
>> is), you are in a world of hurt, and you can't use any algorithm that
>> has fixed time constants, even one as long as 1 second.
>>
>> Wish this weren't so, but it is.
>>
>> Bufferbloat is a disaster...
>>
>
> Given the loss-based algorithms for TCP/etc, yes.  We have to figure out
> how to (as reliably *as possible*) deliver low-latency data in this
> environment.
>
>
> --
> Randell Jesup
> randell-ietf at jesup.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/rtp-congestion/attachments/20111012/c7474ede/attachment-0001.html>