[R-C] Most problems are at the bottleneck: was Re: bufferbloat-induced delay at a non-bottleneck node

Fri Oct 14 00:46:57 CEST 2011

*NOTE:* Anyone getting this who isn't on the R-C list, please join; I'm 
stopping CC-ing people after this message.

On 10/13/2011 10:33 AM, Jim Gettys wrote:
> Sorry for the length of this.

No problem; it's a very useful summary.

> Problems are usually at the bottleneck, unless you are suffering from
> general network congestion.
>
> The most common bottlenecks turn out to be your broadband link, and also
> the 802.11 link between your device and your home network (in a home
> environment). Since the bandwidths are roughly comparable, the
> bottleneck shifts back and forth between them, and can easily be
> different in different directions.

Yes - though in my case for desktops it's generally the main internet or 
the other end's downstream, and for wireless it's usually 802.11 (I have 
FiOS something like 30 or 35Mbps down, 20Mbps up).

>>> Note the different coloured lines.  They represent the amount of
>>> buffering measured in the broadband edge in *seconds*.  Also note that
>>> for various reasons, the netalyzr data is actually likely
>>> underestimating the problem.
>>
>> Understood.  Though that's not entirely relevant to this problem,
>> since the congestion-control mechanisms we're using/designing here are
>> primarily buffer-sensing algorithms that attempt to keep the buffers
>> in a drained state.  If there's no competing traffic at the
>> bottleneck, they're likely to do so fairly well, though more
>> simulation and real-world tests are needed.  I'll note that several
>> organizations (Google/GIPS, Radvision and my old company WorldGate)
>> had found that these types of congestion-control algorithms are quite
>> effective in practice.
>
> Except that both transient bufferbloat (e.g. what I described in my
> screed against IW10), and even a single long lived TCP flow for many
> applications (on anything other than Windows XP), will fully saturate
> the link).

Yes, agreed (as I mentioned below).  In *most* cases it this class of 
algorithm still tends to work fairly well.

>> More importantly, perhaps, bufferbloat combined with the high 'burst'
>> nature of browser network systems (and websites) optimizing for
>> page-load time means you can get a burst of data at a congestion point
>> that isn't normally the bottleneck.
>
> Yup. Ergo the screed, trying to get people to stop before making things
> worse.  The irony is that I do understand that, were it not for the fact
> that browsers have long since discarded HTTP's 2 connection rule, it
> might be a good idea, and help encourage better behaviour.

SPDY might help some here (though part of SPDY's purpose is to continue 
to saturate that TCP connection even better, so maybe not).

We can't control other browsers/devices on the same connection; we may 
be able to control other code within the same browser.

>
>>
>> The basic scenario goes like this:
>>
>> 1. established UDP flow near bottleneck limit at far-end upstream>
>> 2. near-end browser (or browser on another machine in the same house)
>>     initiates a page-load
>> 3. near-end browser opens "many" tcp connections to the site and
>>     other sites that serve pieces (ads, images, etc) of the page.
>> 4. Rush of response data saturates the downstream link to the
>>     near-end, which was not previously the bottleneck.  Due to
>>     bufferbloat, this can cause a significant amount of data to be
>>     temporarily buffered, delaying competing UDP data significantly
>>     (tenths of a second, perhaps>1 second in cases).  This is hard
>>     to model accurately; real-world tests are important.
>
> I've seen up to 150ms on 50Mbps cable service.  Other experiments,
> particularly controlled ones, very welcome.
>
> At 10Mbps, that could be over .5 seconds.  And it depends on the web
> site and whether IW10 has been turned on.

One of the more interesting issues for me is how "normal" browsing 
affects this, and for how long - a single burst of buffer-fullness can 
be recovered from fairly quickly if you're aggressive at throwing away 
buffered data in order to ride the delay curve back down (i.e. you'll 
have 100's of ms (or 1's) of queued RTP data that will all come in at 
well above realtime speed, depending on the next-worst bottleneck after 
the main one.)

>> Now, in practice this is likely messier and the pageload doesn't
>> generate a huge sudden block of data that fills the buffers, so
>> there's some upward slope to delay as you head to saturation of the
>> downstream buffers.  And there's very little you can do about this -
>> and backing off a lot may help in that the less data you put onto the
>> end of this overloaded queue (assuming the pageload flow has ended or
>> soon will), the sooner the queue will drain and low-latency will be
>> re-established.
>
> and your poor jitter buffers are *really* unhappy.

Maybe; it depends (see above).

>> Given the loss-based algorithms for TCP/etc, yes.  We have to figure
>> out how to (as reliably *as possible*) deliver low-latency data in
>> this environment.
>
> Personal Opinion
> -----------------------
>
> Well, here's my honest opinion, formed over the last 15 months.
>
> We can't really make our jitter buffers so big as to make for decent
> audio/video when bufferbloat is present, unless you like talking to
> someone half way to the moon (or further).  Netalyzr shows the problem
> in broadband, but our OS's and home routers are often even worse.

The jitter buffers don't have to be that large - in steady-state, you 
have a lot of delay.  You do have to manage delay some, but delay in the 
network doesn't directly affect you.  Transitions  in and out of 
bufferbloat will, but the jitter buffer should handle that.

> Even one TCP connection (moving big data), can induce severe latency on
> a large fraction of the existing broadband infrastructure; as Windows XP
> retires and more and more applications deploy (e.g. backup, etc.), I
> believe we'll hurt more and more.
>
> It's impossible to make any servo system work faster than the RTT time;
> and bufferbloat causes that to sometimes go insane.
>
> We can't forklift upgrade all the TCP implementations, which will
> compete with low-latency audio/video.  So "fixing" TCP isn't going to
> happen fast enough to be useful.  That doesn't mean it shouldn't happen,
> just that it's a 5-15 year project to do so.

Agreed.  Such things get locked in stone.

> We do have to do congestion avoidance, as well as TCP would do (if
> bufferbloat weren't endemic).
>
> Delay based congestion avoidance algorithms are likely to lose relative
> to loss based ones, as far as I understand.  So that means that the same
> issue applies as "fixing" TCP.

Yes - though in practice if you're not sharing the bottleneck link with 
problematic flows it's not bad in practice, using these sorts of 
algorithms.  You might have to be willing to be slightly 
non-TCP-friendly, but the downside of that is that in order to make TCP 
back off, you need to let the buffers at the bottleneck fill up, at 
least temporarily.

> So the conclusion I came to this time last year was that bufferbloat was
> a disaster for the immersive teleconferencing I'm supposed to be working
> on, and I switched to working solely on bufferbloat, and getting it
> fixed.  Because to make any  of this work well (which means not
> generating service calls), we have to fix it.
>
> Timestamps are *really* useful to detect bufferbloat, and detecting
> bufferbloat suffering is key to getting people aware of it and motivated
> to fix it.  I'd really like to be able to tell people what's going on in
> a reliable way, to motivate them to at least fix the gear under their
> control and/or provide pressure on those they pay to provide service.
> Identifying where the bottleneck is that is at fault is key to this.
>
> So we have to provide "back pressure" into the economic system to get
> people to fix the network.  But trying to engineer around this entirely
> I believe is futile and counter-productive: we have to fix the
> Internet.  To fix the broadband edge will cost of order $100/subscriber:
> this isn't an insane price to pay, as even one or two service calls cost
> more than that.

Well, I'm not sure what we can do to help, but perhaps it behooves us to 
help provide feedback to customers if we appear to be losing against TCP 
flows due to bufferbloat; perhaps even build user-oriented sites to help 
users mitigate the problems as best they can.  (Tough, but could help some.)

>
> Does this mean we're doomed?
> -----------------------------------------
>
> I hope not. I think there is going to have to be a multi-prong attack on
> the problem.
>
> My sense is that the worst problem is in the home and on wireless
> networks.  As I can't work on the wireless networks except 802.11, I've
> focussed there.  But in the home, courtesy of Linux being commonly used
> in home routers, we have the ability to do a whole lot.
>
> In the short/immediate term, mitigations are possible.  My home network
> now works tremendously better than it did a year ago, and yours can
> immediately too, even with many existing home routers.  But doing so is
> probably beyond non-network wizards today.

Yes.  Useful for looking into, but not for solving the problem.  (And 
pressuring router makers - but that's a near-0-margin game for most of them.

> The CeroWrt build of OpenWrt is a place where we're working on both
> mitigations and solutions for bufferbloat in home routers (along with
> other things that have really annoyed us about what we can buy
> commercially).  See: http://www.bufferbloat.net/news/19 . Please come
> help out.  The immediate mitigations include just tuning the router's
> buffering to something more sensible.
>
> Over the next several months, we hope to start testing AQM algorithms.
>
> Note that even if the traditional 100ms "rule of thumb" buffersizing
> http://gettys.wordpress.com/2011/07/06/rant-warning-there-is-no-single-right-answer-for-buffering-ever/
> is still too high; we really need AQM in both our home routers, our
> broadband gear, and in our operating systems.  The long term telephony
> standard for "good enough" latency is 150ms, and to leave the gate
> having lost 100ms isn't a good; if both ends are congested, you are at
> 200ms + the delay in the network.
>
> Now, if you are willing to bandwidth shape your broadband service
> strongly, you can already do much better than 100ms today.  That
> requires you to tune your home router (if it is capable).  I'll be
> posting a more "how to" entry in the blog sometime soon; but network
> geeks should be able to hack what I wrote before at:
> http://gettys.wordpress.com/2010/12/13/mitigations-and-solutions-of-bufferbloat-in-home-routers-and-operating-systems/
> Your other strategy as I'll outline in my how-toish document is to
> ensure your wireless bandwidth is *always* higher than that of your
> broadband bandwidth; ensuring the bottleneck is at a point in the
> network you can control.  This still doesn't solve downstream transient
> bufferbloat at bad web sites, but I think it will solve
> upstream/downstream elephant flows killing you.
>
> Another form of mitigation is getting the broadband buffering back under
> control.  That will get us back to the vicinity of the traditional "rule
> of thumb".
> http://gettys.wordpress.com/2011/07/13/progress-on-the-cable-front/
> which is a lot better than where we are now.
>
> Since I wrote that, I've confirmed (most recently last week) that the
> cable modem and CMTS changes are well under way; it appears sometime
> mid/late 2012 that will start deployment.  You may need to buy a new
> cable modem when the time comes (though ones with the upgrade will
> probably start shipping this year).  I have no clue if older existing
> cable modems will ever see firmware upgrades, though I predict DOCSIS 2
> modems almost certainly will not. I am hopeful that their motion will
> force mitigation into DSL and fiber at least eventually.  But this just
> gets us back to the 100ms range (maybe worse, given powerboost).

That's great.

>
> Obviously, if your network operator doesn't run AQM, then they should
> and you should help educate them.
>
> Solutions
> ======
> Solutions come in a number of forms.
>
> We need AQM that works and is self tuning.  And we need it even in our
> operating systems.  The challenge here is that classic RED 93 and
> similar algorithms won't work in the face of highly variable bandwidth.
>
> Traffic classification can at best move who suffers when, but doesn't
> fix the problem.  I still want it, however.  To do it for real, in the
> broadband edge will be "interesting", as to who classifies traffic how;
> today, you typically only get one queue however (though the technologies
> will often support multiple queues).
>
> Classification would also be really nice: but today, most broadband
> systems have exactly one queue that you have access to.  Carrier's VOIP
> is generally provisioned separately; they have a (unintended, I believe)
> fundamental advantage right now.  It turns out that diffserv has been
> discovered by (part of) the gaming industry noticing that Linux's
> PFIFO-FAST queue discipline implements diffserv.  So you can get some
> help by marking traffic. Andrew McGregor had the interesting idea that
> maybe the broadband headends could observe how traffic is being marked
> and classify similarly in the downstream direction.
>
> Even with only one queue, at least we can control what happens in the
> upstream direction (at least if we can keep the buffers from filling in
> the broadband gear).  In the short term, bandwidth shaping is our best
> tool, and I'm working on other ideas as well.
>
> Getting all the queues lined up is still going to take some effort,
> between diffserv marking, 802.11 queues, ethernet queues, etc...
>
> I also believe that we need the congestion exposure stuff going on in
> the IETF in the long term, to provide disincentives for abuse of the
> network, as well as proper accounting of congestion.
>
> What should this group do?
> ================
>
> I have not seen a way to really engineer around bufferbloat at the
> application layer, nor even in the network stack. It's why I'm working
> on bufferbloat rather than teleconferencing, which I was hired to work
> on; if we don't fix that, we can't really succeed properly on the
> teleconferencing front.
>
> I believe therefore:
>      o work on the real time applications problem should not stop in the
> meanwhile; it is the compelling set of applications to motivate fixing
> the Internet.
>        o exposing the bloat problem so that blame can be apportioned is
> *really* important.  Timestamps would help greatly here in rtp in doing
> so.  Modern TCP's (may) have the TCP timestamp option turned on (I know
> modern Linux systems do), so I don't know of anything needed there
> beyond ensuring the TCP information is made available somehow, if it
> isn't already. Being able to reliably tell people: "The network is
> broken, you need to fix (your OS/your router/your broadband gear)." is
> productive. and to deploy IPv6 we're looking to deploying new home kit
> anyway.

We can look into that.  Suggestions welcome.

>      o designing good congestion avoidance that will work in in an
> unbroken, unbloated network is clearly needed.  But I don't think heroic
> engineering around bufferbloat is worthwhile right now for RTP; that
> effort is better put into the solutions outlined above, I think.  Trying
> to do so when we've already lost the war (teleconferencing isn't
> interesting when talking half way to the moon) is not productive, and
> getting stable servo systems to work not just at the 100ms level, but
> the multi-second level, when multi-second level isn't even usable for
> the application is a waste.  RTP == Real-Time Transport Protocol, when
> the network is no longer real time, is an oxymoron.

In practice it really does work most of the time.  But not all.

>      o worrying about how to get diffserv actually usable (so that we can
> classify at the broadband head end) seems worthwhile to me.  I'd like to
> get the web mice (transient bufferbloat), to not interfere with
> audio/video traffic.  I like Andrew McGregor's idea, but don't know if
> will hold water.  That we can expect diffserv to sort of work in the
> upstream direction already is good news; but we also need downstream to
> work.
>      o come help on the home router problem; if you want teleconferencing
> to really work well, it needs lots of TLC.  And we have the ability to
> not just write specs, but to demonstrate working code here.

-- 
Randell Jesup
randell-ietf at jesup.org