[R-C] Most problems are at the bottleneck: was Re: bufferbloat-induced delay at a non-bottleneck node

Thu Oct 13 22:02:32 CEST 2011

Wow, that was quite a rundown of the bufferbloat problem. Thanks for that.

Skipping down to what this group should do, I interpret your input as "make
the real-time CC algorithms work really well in a non-bufferbloated
scenario". I welcome this idea; I've taken the same stance myself for quite
some time. In the case of insane buffer sizes, we will always run the risk
of having someone else (typically TCP flows) pee in our pants for us, if you
pardon the analogy. No matter how far we back off, we're going to see full
buffers rendering two-way conversation useless. I do think that
delay-sensing CC is the way to go anyway, since it can avoid filling large
buffers when it's *not* competing with loss-sensing CCs on the bottleneck
link -- I'm optimistic and count that as a half-victory against
bufferbloating. :)

It would definitely be interesting if a real-time communication application
could alert the user that the network is broken. However, I do think that it
would have to be more specific in order for the average user to channel the
blame to the right instance (far end or near end, ISP or home equipment, or
OS).

/Henrik

On Thu, Oct 13, 2011 at 4:33 PM, Jim Gettys <jg at freedesktop.org> wrote:

> Sorry for the length of this.
>
> Problems are usually at the bottleneck, unless you are suffering from
> general network congestion.
>
> The most common bottlenecks turn out to be your broadband link, and also
> the 802.11 link between your device and your home network (in a home
> environment). Since the bandwidths are roughly comparable, the
> bottleneck shifts back and forth between them, and can easily be
> different in different directions.
>
> There can be problems elsewhere: peering points and firewall gateways
> are also common problems, along with general congestion in networks that
> are being run without AQM (which is, unfortunately, a significant
> fraction of the Internet, and I hypothesise even more common in
> corporate networks).
>
> That TCP congestion avoidance is mostly toast is not good news: we could
> see a return of classic congestion collapse, and I have one unconfirmed
> report of one significant network having done so.
>
> On 10/12/2011 01:12 AM, Randell Jesup wrote:
> > Jim:  We're moving this discussion to the newly-created mailing
> > sub-list -
> >    Rtp-congestion at alvestrand.no
> >    http://www.alvestrand.no/mailman/listinfo/rtp-congestion
> >
> > If you'd like to continue this discussion (and I'd love you to do so),
> > please join the mailing list.  (Patrick, you may want to join too and
> > read the very small backlog of messages (perhaps 10 so far)).
> >
> > On 10/11/2011 4:17 PM, Jim Gettys wrote:
> >> On 10/11/2011 03:11 AM, Henrik Lundin wrote:
> >>>
> >>>
> >>> I do not agree with you here. When an over-use is detected, we propose
> >>> to measure the /actual/ throughput (over the last 1 second), and set
> >>> the target bitrate to beta times this throughput. Since the measured
> >>> throughput is a rate that evidently was feasible (at least during that
> >>> 1 second), any beta<  1 should assert that the buffers get drained,
> >>> but of course at different rates depending on the magnitude of beta.
> >> Take a look at the data from the ICSI netalyzr: you'll find scatter
> >> plots at:
> >>
> >>
> http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/
> >>
> >>
> >> Note the different coloured lines.  They represent the amount of
> >> buffering measured in the broadband edge in *seconds*.  Also note that
> >> for various reasons, the netalyzr data is actually likely
> >> underestimating the problem.
> >
> > Understood.  Though that's not entirely relevant to this problem,
> > since the congestion-control mechanisms we're using/designing here are
> > primarily buffer-sensing algorithms that attempt to keep the buffers
> > in a drained state.  If there's no competing traffic at the
> > bottleneck, they're likely to do so fairly well, though more
> > simulation and real-world tests are needed.  I'll note that several
> > organizations (Google/GIPS, Radvision and my old company WorldGate)
> > had found that these types of congestion-control algorithms are quite
> > effective in practice.
>
> Except that both transient bufferbloat (e.g. what I described in my
> screed against IW10), and even a single long lived TCP flow for many
> applications (on anything other than Windows XP), will fully saturate
> the link).
>
>
> >
> > However, it isn't irrelevant to the problem either:
> >
> > This class of congestion-control algorithms are subject to "losing" if
> > faced with a sustained high-bandwidth TCP flow like some of your
> > tests, since they back off when TCP isn't seeing any restriction
> > (loss) yet. Eventually TCP will fill the buffers.
>
> Exactly.  In my tests, it took about 10 seconds to fill a 1 second
> buffer (which is not atypical of the current amount of bloat).
>
> >
> > More importantly, perhaps, bufferbloat combined with the high 'burst'
> > nature of browser network systems (and websites) optimizing for
> > page-load time means you can get a burst of data at a congestion point
> > that isn't normally the bottleneck.
>
> Yup. Ergo the screed, trying to get people to stop before making things
> worse.  The irony is that I do understand that, were it not for the fact
> that browsers have long since discarded HTTP's 2 connection rule, it
> might be a good idea, and help encourage better behaviour.
>
>
> >
> > The basic scenario goes like this:
> >
> > 1. established UDP flow near bottleneck limit at far-end upstream>
> > 2. near-end browser (or browser on another machine in the same house)
> >    initiates a page-load
> > 3. near-end browser opens "many" tcp connections to the site and
> >    other sites that serve pieces (ads, images, etc) of the page.
> > 4. Rush of response data saturates the downstream link to the
> >    near-end, which was not previously the bottleneck.  Due to
> >    bufferbloat, this can cause a significant amount of data to be
> >    temporarily buffered, delaying competing UDP data significantly
> >    (tenths of a second, perhaps >1 second in cases).  This is hard
> >    to model accurately; real-world tests are important.
>
> I've seen up to 150ms on 50Mbps cable service.  Other experiments,
> particularly controlled ones, very welcome.
>
> At 10Mbps, that could be over .5 seconds.  And it depends on the web
> site and whether IW10 has been turned on.
>
> > 5. Congestion-control algorithm notices transition to buffer-
> >    induced delay, and tells the far side to back off.  The latency
> >    of this decision may help us avoid over-reacting, as we have to
> >    see increasing delay which takes a number of packets (at least
> >    1/10 second, and easily could be more).  Also, the result of
> >    the above "inrush"/pageload-induced latency may not trigger the
> >    congestion mechanisms we discuss here, as we might see a BIG jump
> >    in delay followed by steady delay or a ramp down (since if the
> >    buffer has suddenly jumped from drained to full, all it can do is
> >    be stable or drain).
> >
> > Note that Google's current algorithm (which you comment on above) uses
> > recent history for choosing the reduction; in this case it's hard to
> > say what the result would be: if it invokes the backoff at the start
> > of the pageload, then the bandwidth received recently is the current
> > bandwidth, so the new bandwidth is current minus small_delta.  If it
> > happens after data has queued behind the burst of TCP traffic, then
> > when the backoff is generated we'll have gotten almost no data through
> > "recently" and we may back off all the way to min bandwidth; an
> > over-reaction, depending on the time constant and level of how fast
> > that burst can fill the downstream buffers.
> >
> > Now, in practice this is likely messier and the pageload doesn't
> > generate a huge sudden block of data that fills the buffers, so
> > there's some upward slope to delay as you head to saturation of the
> > downstream buffers.  And there's very little you can do about this -
> > and backing off a lot may help in that the less data you put onto the
> > end of this overloaded queue (assuming the pageload flow has ended or
> > soon will), the sooner the queue will drain and low-latency will be
> > re-established.
>
> and your poor jitter buffers are *really* unhappy.
>
> >
> > Does the ICSI data call out *where* the buffer-bloat occurs?
>
> The data is primarily the broadband connection.
>
> The reason we can say that confidently is the structure present in the
> plot is on powers of two increments.
>
> Buffering in the home router, or in your host operating system, is in #
> of packets, typically 1500 bytes, and would not show that.
>
> Also note that your home router and host is often *much worse* than the
> broadband connection.
>
> On current hardware, there is typically 200-300 packets of buffering
> just in the ring buffers of the network device.  On top of that, there
> may be even even another 1000 packets of buffering (e.g. the transmit
> queue in Linux).
>
> And there can be bloat elsewhere: peering disputes seem to be causing
> these, along with firewall relays.
>
> RED is not being deployed where it should, and you can have application
> bufferbloat at firewall relays.
> http://gettys.wordpress.com/2010/12/17/red-in-a-different-light/
>
> >
> >> Then realise that when congested, nothing you do can react faster than
> >> the RTT including the buffering.
> >>
> >> So if your congestion is in the broadband edge (where it often/usually
> >> is), you are in a world of hurt, and you can't use any algorithm that
> >> has fixed time constants, even one as long as 1 second.
> >>
> >> Wish this weren't so, but it is.
> >>
> >> Bufferbloat is a disaster...
> >
> > Given the loss-based algorithms for TCP/etc, yes.  We have to figure
> > out how to (as reliably *as possible*) deliver low-latency data in
> > this environment.
>
> Personal Opinion
> -----------------------
>
> Well, here's my honest opinion, formed over the last 15 months.
>
> We can't really make our jitter buffers so big as to make for decent
> audio/video when bufferbloat is present, unless you like talking to
> someone half way to the moon (or further).  Netalyzr shows the problem
> in broadband, but our OS's and home routers are often even worse.
>
> Even one TCP connection (moving big data), can induce severe latency on
> a large fraction of the existing broadband infrastructure; as Windows XP
> retires and more and more applications deploy (e.g. backup, etc.), I
> believe we'll hurt more and more.
>
> It's impossible to make any servo system work faster than the RTT time;
> and bufferbloat causes that to sometimes go insane.
>
> We can't forklift upgrade all the TCP implementations, which will
> compete with low-latency audio/video.  So "fixing" TCP isn't going to
> happen fast enough to be useful.  That doesn't mean it shouldn't happen,
> just that it's a 5-15 year project to do so.
>
> We do have to do congestion avoidance, as well as TCP would do (if
> bufferbloat weren't endemic).
>
> Delay based congestion avoidance algorithms are likely to lose relative
> to loss based ones, as far as I understand.  So that means that the same
> issue applies as "fixing" TCP.
>
> So the conclusion I came to this time last year was that bufferbloat was
> a disaster for the immersive teleconferencing I'm supposed to be working
> on, and I switched to working solely on bufferbloat, and getting it
> fixed.  Because to make any  of this work well (which means not
> generating service calls), we have to fix it.
>
> Timestamps are *really* useful to detect bufferbloat, and detecting
> bufferbloat suffering is key to getting people aware of it and motivated
> to fix it.  I'd really like to be able to tell people what's going on in
> a reliable way, to motivate them to at least fix the gear under their
> control and/or provide pressure on those they pay to provide service.
> Identifying where the bottleneck is that is at fault is key to this.
>
> So we have to provide "back pressure" into the economic system to get
> people to fix the network.  But trying to engineer around this entirely
> I believe is futile and counter-productive: we have to fix the
> Internet.  To fix the broadband edge will cost of order $100/subscriber:
> this isn't an insane price to pay, as even one or two service calls cost
> more than that.
>
> Does this mean we're doomed?
> -----------------------------------------
>
> I hope not. I think there is going to have to be a multi-prong attack on
> the problem.
>
> My sense is that the worst problem is in the home and on wireless
> networks.  As I can't work on the wireless networks except 802.11, I've
> focussed there.  But in the home, courtesy of Linux being commonly used
> in home routers, we have the ability to do a whole lot.
>
> In the short/immediate term, mitigations are possible.  My home network
> now works tremendously better than it did a year ago, and yours can
> immediately too, even with many existing home routers.  But doing so is
> probably beyond non-network wizards today.
>
> The CeroWrt build of OpenWrt is a place where we're working on both
> mitigations and solutions for bufferbloat in home routers (along with
> other things that have really annoyed us about what we can buy
> commercially).  See: http://www.bufferbloat.net/news/19 . Please come
> help out.  The immediate mitigations include just tuning the router's
> buffering to something more sensible.
>
> Over the next several months, we hope to start testing AQM algorithms.
>
> Note that even if the traditional 100ms "rule of thumb" buffersizing
>
> http://gettys.wordpress.com/2011/07/06/rant-warning-there-is-no-single-right-answer-for-buffering-ever/
> is still too high; we really need AQM in both our home routers, our
> broadband gear, and in our operating systems.  The long term telephony
> standard for "good enough" latency is 150ms, and to leave the gate
> having lost 100ms isn't a good; if both ends are congested, you are at
> 200ms + the delay in the network.
>
> Now, if you are willing to bandwidth shape your broadband service
> strongly, you can already do much better than 100ms today.  That
> requires you to tune your home router (if it is capable).  I'll be
> posting a more "how to" entry in the blog sometime soon; but network
> geeks should be able to hack what I wrote before at:
>
> http://gettys.wordpress.com/2010/12/13/mitigations-and-solutions-of-bufferbloat-in-home-routers-and-operating-systems/
> Your other strategy as I'll outline in my how-toish document is to
> ensure your wireless bandwidth is *always* higher than that of your
> broadband bandwidth; ensuring the bottleneck is at a point in the
> network you can control.  This still doesn't solve downstream transient
> bufferbloat at bad web sites, but I think it will solve
> upstream/downstream elephant flows killing you.
>
> Another form of mitigation is getting the broadband buffering back under
> control.  That will get us back to the vicinity of the traditional "rule
> of thumb".
> http://gettys.wordpress.com/2011/07/13/progress-on-the-cable-front/
> which is a lot better than where we are now.
>
> Since I wrote that, I've confirmed (most recently last week) that the
> cable modem and CMTS changes are well under way; it appears sometime
> mid/late 2012 that will start deployment.  You may need to buy a new
> cable modem when the time comes (though ones with the upgrade will
> probably start shipping this year).  I have no clue if older existing
> cable modems will ever see firmware upgrades, though I predict DOCSIS 2
> modems almost certainly will not. I am hopeful that their motion will
> force mitigation into DSL and fiber at least eventually.  But this just
> gets us back to the 100ms range (maybe worse, given powerboost).
>
> Obviously, if your network operator doesn't run AQM, then they should
> and you should help educate them.
>
> Solutions
> ======
> Solutions come in a number of forms.
>
> We need AQM that works and is self tuning.  And we need it even in our
> operating systems.  The challenge here is that classic RED 93 and
> similar algorithms won't work in the face of highly variable bandwidth.
>
> Traffic classification can at best move who suffers when, but doesn't
> fix the problem.  I still want it, however.  To do it for real, in the
> broadband edge will be "interesting", as to who classifies traffic how;
> today, you typically only get one queue however (though the technologies
> will often support multiple queues).
>
> Classification would also be really nice: but today, most broadband
> systems have exactly one queue that you have access to.  Carrier's VOIP
> is generally provisioned separately; they have a (unintended, I believe)
> fundamental advantage right now.  It turns out that diffserv has been
> discovered by (part of) the gaming industry noticing that Linux's
> PFIFO-FAST queue discipline implements diffserv.  So you can get some
> help by marking traffic. Andrew McGregor had the interesting idea that
> maybe the broadband headends could observe how traffic is being marked
> and classify similarly in the downstream direction.
>
> Even with only one queue, at least we can control what happens in the
> upstream direction (at least if we can keep the buffers from filling in
> the broadband gear).  In the short term, bandwidth shaping is our best
> tool, and I'm working on other ideas as well.
>
> Getting all the queues lined up is still going to take some effort,
> between diffserv marking, 802.11 queues, ethernet queues, etc...
>
> I also believe that we need the congestion exposure stuff going on in
> the IETF in the long term, to provide disincentives for abuse of the
> network, as well as proper accounting of congestion.
>
> What should this group do?
> ================
>
> I have not seen a way to really engineer around bufferbloat at the
> application layer, nor even in the network stack. It's why I'm working
> on bufferbloat rather than teleconferencing, which I was hired to work
> on; if we don't fix that, we can't really succeed properly on the
> teleconferencing front.
>
> I believe therefore:
>    o work on the real time applications problem should not stop in the
> meanwhile; it is the compelling set of applications to motivate fixing
> the Internet.
>      o exposing the bloat problem so that blame can be apportioned is
> *really* important.  Timestamps would help greatly here in rtp in doing
> so.  Modern TCP's (may) have the TCP timestamp option turned on (I know
> modern Linux systems do), so I don't know of anything needed there
> beyond ensuring the TCP information is made available somehow, if it
> isn't already. Being able to reliably tell people: "The network is
> broken, you need to fix (your OS/your router/your broadband gear)." is
> productive. and to deploy IPv6 we're looking to deploying new home kit
> anyway.
>    o designing good congestion avoidance that will work in in an
> unbroken, unbloated network is clearly needed.  But I don't think heroic
> engineering around bufferbloat is worthwhile right now for RTP; that
> effort is better put into the solutions outlined above, I think.  Trying
> to do so when we've already lost the war (teleconferencing isn't
> interesting when talking half way to the moon) is not productive, and
> getting stable servo systems to work not just at the 100ms level, but
> the multi-second level, when multi-second level isn't even usable for
> the application is a waste.  RTP == Real-Time Transport Protocol, when
> the network is no longer real time, is an oxymoron.
>    o worrying about how to get diffserv actually usable (so that we can
> classify at the broadband head end) seems worthwhile to me.  I'd like to
> get the web mice (transient bufferbloat), to not interfere with
> audio/video traffic.  I like Andrew McGregor's idea, but don't know if
> will hold water.  That we can expect diffserv to sort of work in the
> upstream direction already is good news; but we also need downstream to
> work.
>    o come help on the home router problem; if you want teleconferencing
> to really work well, it needs lots of TLC.  And we have the ability to
> not just write specs, but to demonstrate working code here.
>
>                    - Jim
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/rtp-congestion/attachments/20111013/d8687e9f/attachment-0001.html>