[R-C] Most problems are at the bottleneck: was Re: bufferbloat-induced delay at a non-bottleneck node

Fri Oct 14 22:28:37 CEST 2011

On 10/13/2011 7:43 PM, Jim Gettys wrote:
> On 10/13/2011 06:46 PM, Randell Jesup wrote:
>> Yes - though in my case for desktops it's generally the main internet
>> or the other end's downstream, and for wireless it's usually 802.11 (I
>> have FiOS something like 30 or 35Mbps down, 20Mbps up).
>
> The problem is that wireless is highly variable, and roughly comparable
> to broadband bandwidth.  So we get the bottleneck going back and forth
> (particularly since wireless is shared, and so others sharing the
> wireless can slow the wireless bandwidth).
>
> Best strategy for most home users is to try to get the bottleneck firmly
> into the broadband link and use bandwidth shaping to control the
> buffering there, since the host OS is not under your control.  So you
> have a good excuse to go buy the shiny 802.11n router you have been
> lusting after and hadn't convinced your wife/husband to buy.....  If you
> do that, you can get really good behaviour today (until you wander too
> far from your AP).

That's fine for me, that advice doesn't generally help our users.

>>> Yup. Ergo the screed, trying to get people to stop before making things
>>> worse.  The irony is that I do understand that, were it not for the fact
>>> that browsers have long since discarded HTTP's 2 connection rule, it
>>> might be a good idea, and help encourage better behaviour.
>>
>> SPDY might help some here (though part of SPDY's purpose is to
>> continue to saturate that TCP connection even better, so maybe not).
>
> It helps the transient problem.  It won't help if you are using SPDY for
> bulk download of something the way HTTP is often abused for.
>
> And it takes time for the buffers to fill, so it might help quite a
> lot.  The buffers fill at one packet/ack I gather; the acks get further
> and further apart as the buffer fills.

So, it might help if for no other reason than reducing the number of TCP 
connections and startups, and reducing the number of congestion-control 
streams.

>> We can't control other browsers/devices on the same connection; we may
>> be able to control other code within the same browser.

My point is that while external flows are outside our control, internal 
browser TCP flows are within our control.

>>> We can't really make our jitter buffers so big as to make for decent
>>> audio/video when bufferbloat is present, unless you like talking to
>>> someone half way to the moon (or further).  Netalyzr shows the problem
>>> in broadband, but our OS's and home routers are often even worse.
>>
>> The jitter buffers don't have to be that large - in steady-state, you
>> have a lot of delay.  You do have to manage delay some, but delay in
>> the network doesn't directly affect you.  Transitions  in and out of
>> bufferbloat will, but the jitter buffer should handle that.
>
> I fear the spikes I see in my packet traces.  I see multiple retransmits
> and a bunch of packets out of order each time I go through one of the
> buffer fill cycles.

Not something I expect with RTP data - we don't retransmit on drops. 
When looking at TCP data I would expect retransmits once those queues fill.

>>> In the short/immediate term, mitigations are possible.  My home network
>>> now works tremendously better than it did a year ago, and yours can
>>> immediately too, even with many existing home routers.  But doing so is
>>> probably beyond non-network wizards today.
>>
>> Yes.  Useful for looking into, but not for solving the problem.  (And
>> pressuring router makers - but that's a near-0-margin game for most of
>> them.
>
> Again, the approach I have is to build a home router that actually works
> right; ergo CeroWrt; the vendors can pick up the results as they see fit.

That's about the only obvious way; they mostly license the base router 
code from the HW vendor or a 3rd-party SW vendor, then put their 
"corporate UI" and some features on top of it, from what I can tell.

The problem I would expect is that "hobbyist" router firmware is often 
not usable by manufacturers for license issues, or if it is it's too 
hard to reskin in their corporate layout, or it's too hard for them to 
easily configure out stuff they don't want, etc.  And there's no one 
*trying* to sell them on this, unless you can get the 
SoC/reference-design people to pick it up.

>>>         o exposing the bloat problem so that blame can be apportioned is
>>> *really* important.  Timestamps would help greatly here in rtp in doing
>>> so.  Modern TCP's (may) have the TCP timestamp option turned on (I know
>>> modern Linux systems do), so I don't know of anything needed there
>>> beyond ensuring the TCP information is made available somehow, if it
>>> isn't already. Being able to reliably tell people: "The network is
>>> broken, you need to fix (your OS/your router/your broadband gear)." is
>>> productive. and to deploy IPv6 we're looking to deploying new home kit
>>> anyway.
>>
>> We can look into that.  Suggestions welcome.
>
> The first step is detection: simple timestamps get you that.

We can detect delay already (at least RTT delay; one-way is tough, but 
we can approximate how much we are above the low point of one-way delay).

> The next step is to locate the hop; basically, a traceroute like
> algorithm that looks for the hop where the latency goes up unexpectedly
> identifies what hop.  There is a commercial tool called "pingplotter"
> which roughly does this and plots the result graphically.

So diagnostic tools.

>>>       o designing good congestion avoidance that will work in in an
>>> unbroken, unbloated network is clearly needed.  But I don't think heroic
>>> engineering around bufferbloat is worthwhile right now for RTP; that
>>> effort is better put into the solutions outlined above, I think.  Trying
>>> to do so when we've already lost the war (teleconferencing isn't
>>> interesting when talking half way to the moon) is not productive, and
>>> getting stable servo systems to work not just at the 100ms level, but
>>> the multi-second level, when multi-second level isn't even usable for
>>> the application is a waste.  RTP == Real-Time Transport Protocol, when
>>> the network is no longer real time, is an oxymoron.
>>
>> In practice it really does work most of the time.  But not all.
>
> Yes, but I worry as more applications that move big stuff around deploy,
> and Windows XP retires, the situation is only going to get worse.

Could be.

-- 
Randell Jesup
randell-ietf at jesup.org