[R-C] RRTCC issues: loss, decrease rate

Wed Aug 8 07:04:38 CEST 2012

In the case of loss due to congestion (a full queue or AQM action), the loss itself seems like the right signal to process. Why wait to infer congestion from the subsequent delay pattern which can be speculative/unreliable rather than the loss itself?

If the goal is to distinguish congestion from stochastic loss, that is a general problem that probably needs more thought than the RRTCC 3-sigma outlier filter, or Kalman filter (which is designed to filter stochastic jitter but not losses), or Randell's "fishy" filter. There should be ample research available on this topic from many years of TCP over wireless links.

Mo

-----Original Message-----
From: rtp-congestion-bounces at alvestrand.no [mailto:rtp-congestion-bounces at alvestrand.no] On Behalf Of Randell Jesup
Sent: Tuesday, August 07, 2012 6:13 PM
To: rtp-congestion at alvestrand.no
Subject: Re: [R-C] RRTCC issues: loss, decrease rate

Alexander: thanks for the response

On 8/7/2012 8:37 AM, Alexander Kjeldaas wrote:
>
> On Mon, Aug 6, 2012 at 6:10 PM, Randell Jesup <randell-ietf at jesup.org
> <mailto:randell-ietf at jesup.org>> wrote:

>     When the packet is lost, there must have been 2 packets in the
>     buffer.  The receiver will see packet N, N+1, and then N+3 and N+4.
>     N will have been delayed around 38ms, N+1 around 40ms, N+3 about
>     22ms, and N+4 around 24ms  Pictorially  this would look like a
>     sawtooth in the delay profile.
>
>     If you naively input these into the filter, it will start to move
>     down away from a 10% slope, and might indicate a flat queue-depth
>     profile or perhaps even negative (draining) for a short time, when
>     in fact we've been over-bandwidth the entire time.
>
>
> Averaged over a sawtooth, the queue depth *is flat*, right?  So it is
> rather a question of doing correct detection based on the output of the
> Kalman filter, and not the Kalman filter not estimating what is
> happening correctly.

Right, on average it is flat once you're steady-state full-queue.  And 
so the algorithm (if we get to that state) might believe the queue isn't 
full....  i.e. in the state machine it would be in the "flat" state. 
After a while there, it would try to increase sending rate, and it would 
see... no increase in queue depth, and increase again.  (Or at least it 
does in my thought experiment... ;-)  Of course, normally it would 
notice us getting into that state, but I certainly can conceive of it 
not noticing properly.

> Also, for the Kalman filter in RRTCC, the uncertainty of the samples
> will increase in this case, which makes the filter slower.
>
>       This would depend on the exact filter type and usage, but this
>     certainly violates some of the assumptions about error in a Kalman
>     filter (for example, Gaussian distribution).  At least in this case,
>     a Kalman filter might not be the optimum choice.
>
>
> In the non-Gaussian case, the Kalman filter is the best linear (LMMSE)
> state estimator, while in the Gaussian case it is the optimal (MMSE)
> state estimator.
>
> So while it is certainly not the optimum choice, it isn't obviously
> broken either.

As I expected.

>     * Possible modifications:
>     a) drop samples where there's a loss, since losses frequently
>     perturb the delta of the next packet
>          This will reduce the 'noise' in the inputs to the filter,
>     especially in simple cases like above.
>
>
> Which samples in your example would you not feed to the Kalman filter?

Samples where the previous packet was lost.

> Deciding when to start feeding samples into the Kalman filter is another
> classification problem, or it can be done ad-hoc (ad-hoc ~=
> non-probabilistic).
>
> Also, reducing the 'noise' in the inputs to the filter makes it react
> faster, which is not necessarily what you want in this particular case.

Well, the idea here was to segregate actions-at-a-drop from 
actions-not-at-a-drop, since in the 1-flow case I think that would 
produce a clearly distinct difference, and in that case I would want the 
filter to react faster.  (In fact, I probably want faster reaction 
anytime there's significant loss, and that may help with AQM response 
too, where the filter will tell us relatively little or nothing useful.)

> Note that there is an outlier filter in RRTCC.  A sample is capped at 3
> std.dev from the (exponential decay) jitter.

That might happen to drop those, or not.  The model for that assumes 
random noise, which this really isn't.

>
> (end of section 3.1)
> http://tools.ietf.org/html/draft-alvestrand-rtcweb-congestion-00#section-3.1
>
>     b) increase the "uncertainty" of the next packet dramatically in the
>     input to the filter
>          This is an attempt to get the Kalman filter to weight the
>     packet much lower, but not ignore it entirely.  I don't see this as
>     very useful in practice.
>
>
> I think this is conceptually cleaner.  What RRTCC is doing now is
> calculating the exponential decay jitter of incoming packets.   In the
> sawtooth-example, in stock RRTCC, the jitter variance will increase,
> which means that the filter will become slower.  I think this is a good
> property.

Not sure about this for this case (see above).  For gaussian and 
probably cross-traffic jitter, I probably agree.

> The exponential decay jitter calculations means that this will not
> happen on the *first loss*, but some time thereafter (i.e. the filter
> cannot react instantaneously).
>
> By using extra knowledge, we can probably build a more accurate model.
>   My point is that it is not "obviously broken".  A model in the filter
> that accounts for an absolute queue length while still retaining the
> current features would seem to be non-linear though, and thus require a
> different filter approach.

Maybe, or tweaks around a linear filter.  Agreed not obviously broken.

>     c) look at whether there was a significant drop in delay following a
>     loss, and use that as a separate indicator that we're over-bandwidth
>
>
> There will only be a significant drop in delay if the queue is short (no
> bufferbloat).  To know whether a drop is "significant", a notion of
> queue length must be added to the model of the network.

Actually the drop in delay should be the same regardless of queue depth. 
  % delay will be less, but absolute will be the same.

> The larger the queue, and the higher the bandwidth, the smaller drops in
> delay is what we're searching for, and assuming that the queue sits
> egress on the sender, additional jitter is added on the rest of the
> path.  Thus on the receiver side, an ad-hoc filtering (non
> probabilistic) could easily be far from optimal.

I agree the more streams in the queue, the smaller the drops in delay, 
even if we're the one who gets the drop (and the odds of us getting it 
go down, and jitter goes up).

>
> I think it would certainly be interesting to look at a model that
> includes queue length as well as delay, but I think it should be
> probabilistic in nature, probably inside the filter, not as ad-hoc
> filtering on the outside.

Probably correct.  And this discussion makes me believe this input 
signal (delay reduction with a drop) is almost entirely relevant to 
small-number-of-streams on an otherwise non-congested link (eg access 
links usually).  (And of course that's an immensely common case, and the 
main one I was designing for).

> Alexander
>
>     In my algorithm, I termed these "fishy" losses - they implied a
>     queue suddenly shortened.  Too many (more than 1) of these in a
>     short period of time meant I would decrease sending bandwidth by a
>     significant extra amount, since the implication is not just delay,
>     but a full queue, which I really want to get out of fast.  This is a
>     form of modification 'c'.
>
>     If we're not the only stream on the bottleneck, and we just got
>     "unlucky" and had our packet get dropped, similar issues may occur,
>     but the reduction in delay of the next packet may be less or much
>     less, since we're typically metering in our packets at a frame rate.
>       With enough other traffic, the delay of the next packet will be
>     roughly flat (or 40ms in the case above).  So this modification (c)
>     is of most use in detecting the bandwidth/queue limit of a
>     relatively idle link, typically an access (especially upstream)
>     link.  It therefore may want to be combined with other mechanisms.

And what I said here matches the above.

>
>     If we see a direct loss without a significant reduction in delay, we
>     need to assume it's either a congested link (not a physical layer
>     limit we're hitting on an idle link) or it's a "random" loss. Losses
>     on a congested link also indicate a full queue, and so even though
>     the delay stopped increasing (or stayed on=average stable after
>     you've hit the max and started dropped), you still want to decrease
>     sending rate.  If it's a "random" loss (and not AQM), this causes a
>     minor under-utilization, but for a truly congested link or if AQM is
>     causing "random" drops, it's a signal we should reduce to try to
>     start the queue draining.   To avoid over-reaction and 'hunting', it
>     may be good to use some type of threshold, perhaps averaged or
>     filtered.  If the loss rate reached ~5%, I'd make a mild cut in
>     sending rate on top of any filter-suggested cuts, and if it reached
>     ~10% I'd make a strong cut.
>
>     (to be continued in another post)
>
>
-- 
Randell Jesup
randell-ietf at jesup.org
_______________________________________________
Rtp-congestion mailing list
Rtp-congestion at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/rtp-congestion