[R-C] Congestion Control BOF

Tue Oct 11 17:17:04 CEST 2011

FYI - if anyone is getting this and hasn't joined the R-C (congestion 
control) list, please do so as I'm going to stop CC-ing individuals at 
some point today.

On 10/11/2011 3:11 AM, Henrik Lundin wrote:
> Randell,
>
> This is a good start, indeed! Some comments on the details of your
> initial email are below.

Thanks!

> About simulation: Does anyone have any suggestions for suitable
> simulation tools? ns2 is one choice of course, but I'm not very pleased
> with it.

I don't have strong opinions; I've used both dummynet (for simple 
testing, though the latest ones are better for simulating variable delay 
and loss), and NetEm, which I'm not an expert with, but at least 
provides normal distributions of delay, correlation of delays/loss, etc. 
  You'd need to use it with a rate control discipline to model a 
router/NAT/modem.

For modern uses of dummynet in PlantLab, etc see
http://info.iet.unipi.it/~luigi/dummynet/
and
http://info.iet.unipi.it/~luigi/papers/20100316-cc-preprint.pdf

More comments in-line...

>     2. I would use something more adaptive than the current _beta in
>     RemoteRateControl.  Currently if we're over-bandwidth, it cuts
>     bandwidth by 5 or 10% (depending on where we are relative to the
>     "known good max") and hopes that's below the bottleneck bandwidth.
>     On a large change (or bad mis-estimation) the bandwidth might still
>     be above bottleneck rate, and you *really* want to get down below it
>     to drain the buffers fast.
>
>
>     I would prefer to use the effective incoming "slope" (which the
>     magnitude of the Kalman filter should tell us) to estimate how far
>     over-bandwidth we are.  If we're receiving 500Kbps, and 33ms frames
>     are coming in at an average of 36ms, the bandwidth estimate would be
>     ~458Kbps, so to be safe and allow those buffers to drain, I'd
>     probably want to target something like 430Kbps until they drain.  If
>     they came in at 45ms, which implies ~366Kbps available, I'd drop to
>     somewhere around 300 at a guess - dropping to 430 would cause delay
>     to continue to increase.  Also, if there's loss I feel you should
>     drop the rate more, since the bottleneck router (if it has short
>     queues) may have started to tail-drop - and if it random-drops, your
>     estimate may be too high.
>
>
> I do not agree with you here. When an over-use is detected, we propose
> to measure the /actual/ throughput (over the last 1 second), and set the
> target bitrate to beta times this throughput. Since the measured
> throughput is a rate that evidently was feasible (at least during that 1
> second), any beta < 1 should assert that the buffers get drained, but of
> course at different rates depending on the magnitude of beta.

For static analysis (detect the bandwidth of the modem, etc) I agree 
with you.  However, for mid-call detection of changes in bandwidth and 
particularly cross-traffic using recent throughput as a guide will be 
wrong.  The point is that bandwidth has changed, and if it changed by 
more than beta you'll still be over.

Even in startup you may find recent traffic is a bad predictor (and you 
have less to go on), especially if you "start low" and move rapidly 
up-bandwidth, as suggested by myself and Magnus, instead of starting 
high and cutting back. (see below).

>     3. Similarly, I'd want to allow for a smaller 'beta' (faster cuts,
>     alpha for faster increases) at the start of a call.  It's unclear to
>     me from the code if that's the case currently, though if it is it's
>     a side-effect of other things like the rcRegion and the maxHoldRate.
>
>
> I'm not sure I fully understand what you mean by "faster cuts", but I
> gather that you're aiming at speeding up the convergence (up or down)
> during the startup phase. In general, we do agree that it's a good idea
> to let the algorithm be more agile in the beginning. Regarding the beta
> value, see my comment to #2 above.

"faster cuts" means faster reduction in bandwidth; so yes I'm referring 
to running faster convergence in startup.  It also may be smart to 
return to this region on interface changes (definitely) and on major 
cross-traffic changes (which often are a spike of web-page-loading, not 
a long-term flow).

>     Overall, the general approach in RateIncreaseFactor for alpha and
>     the code that calls it is good and matches my understanding of
>     reasonable approaches.  I might also want to slow the rate of
>     increase (especially over the max rate where we didn't see delay
>     increase) every time we get knocked back, which allows for a fast
>     converging search for the channel bandwidth, again especially at the
>     start of a call, though this could be in play after any significant
>     change.
>
>
> This was already implemented in the algorithm, although not described in
> the document.

Good.  I didn't see anything that directly looked at how recently/often 
we'd had to reduce bitrate.

>     4. If a drop occurs at or before the bottleneck, you'll get an
>     unusually-short arrival delta compared to the Ts delta.  I would
>     consider such drops "fishy" as they may indicate tail-dropping
>     somewhere in the chain, and if we see "too many" of them drop
>     bandwidth by a larger amount, even if the Kalman filter indicates Hold.
>
>     This would be especially important if the bottleneck has a (very)
>     short queue - it may build up, drop, build up, drop, etc and to the
>     Kalman filter it might look like you weren't over-bandwidth, just a
>     lot of jitter.
>
>
> This is probably an important corner case (maybe not even a corner...).
> We would have to come up with some extended loss processing at the
> receiver. It's probably necessary to come test this in a simulation
> environment were the bottleneck queues can be varied at will.

This is important in my experience - not every device implements 
bufferbloat. ;-)  I also believe that there are router/bottleneck 
configurations that will cause earlier bottleneck signals to be "lost" 
later on due to router packet scheduling and cross traffic.  This is 
accentuated by the common bottleneck being the local upstream link, so 
you have many routers and cross-traffic following it.

>     5. It appears that the code in remote_rate_control.cc (i.e.receiver)
>     currently starts at the max-configured bitrate; is this correct?
>     What's the impact of this; what does the sender-side start at?
>
>
> This is correct, and means that the receiver side does not enforce or
> ask for any rate reduction until it has experienced the first
> over-bandwidth.

Ok, as per above and Magnus's comments, I tend to disagree with this - 
it almost guarantees you'll be in recovery right at the start of the 
call, which is not the best experience for 1-to-1 communication, IMHO 
(and in my experience).  See my extensive discussion of starting 
bandwidths on rtcweb (I should copy it here for archival).

>     #2 (probably) needs drift compensation since we can't guarantee that
>     the timestamps of different streams are sourced from the same
>     clock.  It's possible that for "normal" drifts this wouldn't be an
>     issue, but I'm leary since I've seen some PBXes and devices that
>     have significant drift (though in this case we only really care
>     about differential drift).
>
>     Note that currently the UpdateKalman() function has the drift
>     compensation inside of it; for either of these it would need to move
>     out of UpdateKalman().
>
>
> If you follow the code, you'll see that the CurrentDrift() method is a
> dummy always returning 1.0. TBD :)

Hmmm.  Oh well.  I guarantee there are devices out there that drift a 
lot...  And even have time go backwards (great fun there).

>
>
>     A primary change needed (in both of these) is that the current
>     Kalman filter is based around an assumption of 30fps source rate.
>      From page 6 of Harald's draft:
>
>             The variance var_v = sigma(v,i)^2 is estimated using an
>         exponential
>             averaging filter, modified for variable sampling rate
>
>               var_v_hat = beta*sigma(v,i-1)^2 + (1-beta)*z(i)^2
>
>               beta = (1-alpha)30/(1000 * f_max)
>
>
>             where f_max = max {1/(T(j) - T(j-1))} for j in i-K+1...i is the
>             highest rate at which frames have been captured by the
>         camera the
>             last K frames and alpha is a filter coefficient typically
>         chosen as a
>             number in the interval [0.1, 0.001].  Since our assumption
>         that v(i)
>             should be zero mean WGN is less accurate in some cases, we have
>             introduced an additional outlier filter around the updates of
>             var_v_hat.  If z(i) > 3 var_v_hat the filter is updated with 3
>             sqrt(var_v_hat) rather than z(i).  In a similar way, Q(i) is
>         chosen
>             as a diagonal matrix with main diagonal elements given by
>
>               diag(Q(i)) = 30/(1000 * f_max)[10^-10 10^-2]^T
>
>             It is necessary to scale these filter parameters with the
>         frame rate
>             to make the detector respond as quickly at low frame rates
>         as at high
>             frame rates.
>
>
>     This assumes that the frame rate == 1/shortest_ts_diff, which is a
>     slightly loose assumption, even without merging streams - though I
>     agree most encoders encode at a fairly fixed rate, and 30 is the
>     most common.  But it still could be well off especially at
>     transition points (say from still to moving) - the encoder might run
>     at 30fps (or more) when still, and drop to 5 or 10 fps (or lower)
>     when the scene transitions into high motion.
>
>     I'll also note that the noise estimate is also based on a 30fps
>     assumption, though that might be easier to modify.  The noise
>     estimate slews more quickly for the first ~ 10 seconds.
>
>
> The 30 fps that you see in the equations is because the tuning of the
> design parameters was done for a nominal 30 fps. But the formulae are
> actually there to compensate the parameters such that the system should
> work for other frame rates too. It looks hard-coded, but it's not.

Ok, I'll look to the code, though I thought I saw the same hard-coded 
assumptions.

> You
> could argue that using the max frame rate (over some window) is not
> correct. We tried with mean frame rate at one stage, but had some
> problems with bias. Probably we just didn't gather enough facts and
> understanding to resolve the issue properly.

A simple weighted moving average of some sort is probably good enough. 
Otherwise detection right after a sudden drop in frame rate could be 
compromised (and sudden drops in frame rate may also correlate with 
increases in bandwidth usage, depending on the encoder).

>     BTW - there's lots of code in there to generate matlab output, to
>     run it standalone (though that seems turned off), etc.  Any guidance
>     or pointers to docs on how to build and run these tests, and how to
>     get these stats out of running calls?  I'm sure I can root around
>     and figure it out, but a pointer would save a bunch of time.  Thanks!
>
>
> Once upon a time, we used this code for real-time visualization of the
> bandwidth estimator using a Matlab Engine. It worked well on Windows
> with the ancient Matlab 5 or 6, or so. We're currently working in a
> linux environment with the latest and greatest Matlab release, and now
> the real-time plotting is quite far from real-time. There is
> unfortunately no docs that describes what's going on. I can see what old
> code I can dig out.

Well, I don't own Matlab, so it won't help me anyways.  And if it's just 
for plotting, there have to be some simpler ways to do real-time data 
plotting on Linux (or others).

-- 
Randell Jesup
randell-ietf at jesup.org