[R-C] Most problems are at the bottleneck: was Re: bufferbloat-induced delay at a non-bottleneck node

Fri Oct 14 23:18:51 CEST 2011

On 10/14/2011 04:28 PM, Randell Jesup wrote:
> On 10/13/2011 7:43 PM, Jim Gettys wrote:
>> On 10/13/2011 06:46 PM, Randell Jesup wrote:
>>> Yes - though in my case for desktops it's generally the main internet
>>> or the other end's downstream, and for wireless it's usually 802.11 (I
>>> have FiOS something like 30 or 35Mbps down, 20Mbps up).
>>
>> The problem is that wireless is highly variable, and roughly comparable
>> to broadband bandwidth.  So we get the bottleneck going back and forth
>> (particularly since wireless is shared, and so others sharing the
>> wireless can slow the wireless bandwidth).
>>
>> Best strategy for most home users is to try to get the bottleneck firmly
>> into the broadband link and use bandwidth shaping to control the
>> buffering there, since the host OS is not under your control.  So you
>> have a good excuse to go buy the shiny 802.11n router you have been
>> lusting after and hadn't convinced your wife/husband to buy.....  If you
>> do that, you can get really good behaviour today (until you wander too
>> far from your AP).
>
> That's fine for me, that advice doesn't generally help our users.

Yeah, ergo shining the light on the problem.
>
>>>> Yup. Ergo the screed, trying to get people to stop before making
>>>> things
>>>> worse.  The irony is that I do understand that, were it not for the
>>>> fact
>>>> that browsers have long since discarded HTTP's 2 connection rule, it
>>>> might be a good idea, and help encourage better behaviour.
>>>
>>> SPDY might help some here (though part of SPDY's purpose is to
>>> continue to saturate that TCP connection even better, so maybe not).
>>
>> It helps the transient problem.  It won't help if you are using SPDY for
>> bulk download of something the way HTTP is often abused for.
>>
>> And it takes time for the buffers to fill, so it might help quite a
>> lot.  The buffers fill at one packet/ack I gather; the acks get further
>> and further apart as the buffer fills.
>
> So, it might help if for no other reason than reducing the number of
> TCP connections and startups, and reducing the number of
> congestion-control streams.

Even one TCP stream will fill the buffers...

Using fewer connections reduces the transient problem.

>
>>> We can't control other browsers/devices on the same connection; we may
>>> be able to control other code within the same browser.
>
> My point is that while external flows are outside our control,
> internal browser TCP flows are within our control.
>
>>>> We can't really make our jitter buffers so big as to make for decent
>>>> audio/video when bufferbloat is present, unless you like talking to
>>>> someone half way to the moon (or further).  Netalyzr shows the problem
>>>> in broadband, but our OS's and home routers are often even worse.
>>>
>>> The jitter buffers don't have to be that large - in steady-state, you
>>> have a lot of delay.  You do have to manage delay some, but delay in
>>> the network doesn't directly affect you.  Transitions  in and out of
>>> bufferbloat will, but the jitter buffer should handle that.
>>
>> I fear the spikes I see in my packet traces.  I see multiple retransmits
>> and a bunch of packets out of order each time I go through one of the
>> buffer fill cycles.
>
> Not something I expect with RTP data - we don't retransmit on drops.
> When looking at TCP data I would expect retransmits once those queues
> fill.

I'm referring to the fact there are multiple packet losses very close
together in time.

>
>>>> In the short/immediate term, mitigations are possible.  My home
>>>> network
>>>> now works tremendously better than it did a year ago, and yours can
>>>> immediately too, even with many existing home routers.  But doing
>>>> so is
>>>> probably beyond non-network wizards today.
>>>
>>> Yes.  Useful for looking into, but not for solving the problem.  (And
>>> pressuring router makers - but that's a near-0-margin game for most of
>>> them.
>>
>> Again, the approach I have is to build a home router that actually works
>> right; ergo CeroWrt; the vendors can pick up the results as they see
>> fit.
>
> That's about the only obvious way; they mostly license the base router
> code from the HW vendor or a 3rd-party SW vendor, then put their
> "corporate UI" and some features on top of it, from what I can tell.
>
> The problem I would expect is that "hobbyist" router firmware is often
> not usable by manufacturers for license issues, or if it is it's too
> hard to reskin in their corporate layout, or it's too hard for them to
> easily configure out stuff they don't want, etc.  And there's no one
> *trying* to sell them on this, unless you can get the
> SoC/reference-design people to pick it up.

Actually, OpenWrt is the "upstream" for some of the smaller router
vendors already.  And yes, I'm trying to get people to realise that
having a good upstream is better than where they are today.  Only time
will tell if we succeed.

And it's our way to get changes/fixes into the upstream projects that
are used by everybody, though the large commercial vendors currently
ship bits that have fermented (rotted) for 5 years or more.  So the way
I look at it is that at worst, the fixes eventually trickle into the
commercial code base; and some will ship much faster.

>
>>>>         o exposing the bloat problem so that blame can be
>>>> apportioned is
>>>> *really* important.  Timestamps would help greatly here in rtp in
>>>> doing
>>>> so.  Modern TCP's (may) have the TCP timestamp option turned on (I
>>>> know
>>>> modern Linux systems do), so I don't know of anything needed there
>>>> beyond ensuring the TCP information is made available somehow, if it
>>>> isn't already. Being able to reliably tell people: "The network is
>>>> broken, you need to fix (your OS/your router/your broadband gear)." is
>>>> productive. and to deploy IPv6 we're looking to deploying new home kit
>>>> anyway.
>>>
>>> We can look into that.  Suggestions welcome.
>>
>> The first step is detection: simple timestamps get you that.
>
> We can detect delay already (at least RTT delay; one-way is tough, but
> we can approximate how much we are above the low point of one-way delay).
>
>> The next step is to locate the hop; basically, a traceroute like
>> algorithm that looks for the hop where the latency goes up unexpectedly
>> identifies what hop.  There is a commercial tool called "pingplotter"
>> which roughly does this and plots the result graphically.
>
> So diagnostic tools.
>
>>>>       o designing good congestion avoidance that will work in in an
>>>> unbroken, unbloated network is clearly needed.  But I don't think
>>>> heroic
>>>> engineering around bufferbloat is worthwhile right now for RTP; that
>>>> effort is better put into the solutions outlined above, I think. 
>>>> Trying
>>>> to do so when we've already lost the war (teleconferencing isn't
>>>> interesting when talking half way to the moon) is not productive, and
>>>> getting stable servo systems to work not just at the 100ms level, but
>>>> the multi-second level, when multi-second level isn't even usable for
>>>> the application is a waste.  RTP == Real-Time Transport Protocol, when
>>>> the network is no longer real time, is an oxymoron.
>>>
>>> In practice it really does work most of the time.  But not all.
>>
>> Yes, but I worry as more applications that move big stuff around deploy,
>> and Windows XP retires, the situation is only going to get worse.
>
> Could be.
>
>