[R-C] Packet loss response - but how?

Fri May 4 19:31:09 CEST 2012

Randell-

Yup, this all makes sense.

Regarding Netflix and other ABR flows......

I would add that the increasing prevalence of bursty Adaptive BitRate
video (HTTP get-get-get of 2-10 second chunks of video) makes the
detection of spare link capacity and/or cross traffic much more
difficult. The ABR traffic pattern boils down to a square wave pattern
of a totally saturated last mile for a few seconds followed by an idle
link for a few seconds. The square wave actually has the TCP sawtooth
modulated on top of it, so there are secondary effects. Throw in a few
instances of ABR video on a given last mile and things get very
interesting.

The solution to this problem is not in scope for the RTCWeb/RTP work,
but I sure wish that the ABR folks would find a way to smooth out their
flows. We have been knocking around some ideas in this area in other
discussions, so if anybody is interested in this please drop me a note.

bvs

-----Original Message-----
From: rtp-congestion-bounces at alvestrand.no
[mailto:rtp-congestion-bounces at alvestrand.no] On Behalf Of Randell Jesup
Sent: Friday, May 04, 2012 12:33 PM
To: rtp-congestion at alvestrand.no
Subject: Re: [R-C] Packet loss response - but how?

On 5/4/2012 9:50 AM, Bill Ver Steeg (versteb) wrote:
> The RTP timestamps are certainly our friends.
>
> I am setting up to run some experiments with the various common buffer
> management algorithms to see what conclusions can be drawn from
> inter-packet arrival times. I suspect that the results will vary
wildly
> from the RED-like algorithms to the more primitive tail-drop-like
> algorithms. In the case of RED-like algorithms, we will hopefully not
> get to much delay/bloat before the drop event provides a trigger. For
> the tail-drop-like algorithms, we may have to use the increasing
> delay/bloat trend as a trigger.

This would match my experience.  As mentioned, I found access-link 
congestion loss (especially when not competing with sustained TCP flows,

which is pretty normal for home use, especially if no one is watching 
Netflix...) results in a sawtooth delay with losses at the delay drops.

This also happens (with more noise and often a faster ramp) when 
competing, especially when competing with small numbers of flows.  Not 
really unexpected.  As this sort of drop corresponds to a full buffer, 
it's pretty much a 'red flag' for a realtime flow.

RED drops I found to be more useful in avoiding delay (of course).  My 
general mechanism was to drop transmission rate (bandwidth estimate) an 
amount proportional to the drop rate; and tail-queue type drops 
(sawtooth) cause much sharper bandwidth drops.  I simply assume all 
drops are in some way related to congestion.

>   As I think about the LEDBAT discussions,
> I am concerned about the interaction between the various algorithms -
> but some data should be informative.

Absolutely.

> We may even be able to differentiate between error-driven loss and
> congestion driven loss, particularly if the noise is on the last hop
of
> the network and thus downstream of the congested queue (which is
> typically where the noise occurs). In my tiny brain, you should be
able
> to see a gap in the time record corresponding to a packet that was
> dropped due to last-mile noise. A packet dropped in the queue upstream
> of the last mile bottleneck would not have that type of time gap. You
do
> need to consider cross traffic in this thought exercise, but
statistical
> methods may be able to separate persistent congestion from persistent
> noise-driven loss.

Exactly the mechanism I used to differentiate "fishy" losses from 
"random" ones; "fishy" losses as mentioned cause bigger responses.  I 
still dropped bandwidth on "random" drops, which can be congestion drops

from RED in a core router so long as the router queue isn't too long.  
You'd also see those from "minimal queue" tail-drop routers.  I did use 
a separate jitter buffer for determining losses (and for my filter info)

from the normal jitter buffer, which being adaptive might not hold the 
data long enough for me. I actually kept around a second of delay/loss 
data on the video channel, and *if there was no loss* or large delay 
ramp I only reported stats every second or two.

> TL;DR - We can probably tell that we have queues building prior to the
> actual loss event, particularly when we need to overcome limitations
of
> poor buffer management algorithms.

If the queues are large enough, or if the over-bandwidth is low enough, 
yes.  If there's a heavy burst of traffic (think modern browsers 
maximizing pageload time to sharded servers), then you may not get a 
chance; you may go from no delay to 200ms taildrop in an RTT or two - or

even between two 20 or 30ms packets.  And you need to filter enough to 
decide if it's jitter or delay.  (You can make an argument that 'jitter'

is really the sum of deltas in path queues, but that doesn't help you 
much in deciding whether to react to it or not.)

Data would be useful... :-)

Generally, for realtime media you really want to be undershooting 
slightly most of the time in order to make sure the queues stay at/near 
0.  The more uncertainty you have, the more you want to undershoot.  A 
stable delay signal makes it fairly safe to probe for additional 
bandwidth because you'll get a quick response, and if the probe is a 
"small" step relative to current bandwidth, then the time to recognize 
the filtered delay signal and inform the other side and have them adapt 
(roughly filter delay + RTT + encoding delay).

High jitter can be the result of wireless or cross-traffic,
unfortunately.

  Also, especially near startup, my equivalent to slow-start was much 
more aggressive initially to find the safe point, but with each 
overshoot (and drop back below the apparent rate) in the same bandwidth 
range I would reduce the magnitude of the next probes until we had 
pretty much determined the safe rate.  This is most effective in finding

the channel bandwidth without significant sustained competing traffic on

the bottleneck link.  If I believed I'd found the channel bandwidth, I 
would remember that, and be much less likely to probe over that limit, 
though I would do so occasionally to see if there had been a change.  
This allowed for faster recovery from short-duration competing traffic 
(the most common case) without overshooting the channel bandwidth.  Note

that the more effective your queue detection logic is, the less you need

that sort of heuristic; it may have been overkill on my part.

-- 
Randell Jesup
randell-ietf at jesup.org

_______________________________________________
Rtp-congestion mailing list
Rtp-congestion at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/rtp-congestion