[R-C] RRTCC issues: loss, decrease rate

Tue Aug 7 14:37:29 CEST 2012

On Mon, Aug 6, 2012 at 6:10 PM, Randell Jesup <randell-ietf at jesup.org>wrote:

> This is the first of a few messages on the details of RRTCC I expect to
> post, based on analysis (not testing) of the algorithm, and my 8-year
> experience with an algorithm I designed which had very similar theory and
> underpinnings.  This has been used (mostly in residential settings) in
> hundreds of thousands of devices mostly for peer-to-peer calls.
>
> Please feel free to critique!  I make no assertions that this analysis is
> guaranteed correct (and in fact I'm sure it will be pointed out several
> ways in which it's wrong), but I think it will be a helpful starting point.
>  I also realize there are some simplifications assumed below; I've tried to
> note them.
>
>
> The first focus is on loss.  I'll primarily focus on the impact of
> tail-drop policies for now.
>
> Loss affects a stream in a number of ways:
>
> 1) Directly - loss of a packet in the RTP stream
> 2) Indirectly - loss of a packet for another traffic stream, for this or
> another destination
> 3) "Random" loss (non-congestion)
>
> Note: Not all channels experience "random" loss, though some "random" loss
> is momentary congestion of a core router or some type of AQM.  Since in
> this case we're focusing on tail-drop routers (at the bottleneck), we'll
> assume this category can be modeled as non-congestive random losses.
>
> Obviously the more inputs and knowledge of the streams (especially between
> the two endpoints, via any port, not just the 5-tuple) the better the model
> can perform.  A confounding issue to be discussed later is how differential
> packet marking affects congestion and bandwidth models.
>
> Since RRTCC works from inter-packet-delays (when compared to sending
> times, either from RTP timestamps of header extensions with
> closer-to-the-stack sending times), let's look at how these types of loss
> affect the signals seen by the receiver and the Kalman filter.
>
> 1) Direct loss
> -------------------
> In this case, we'll see the loss directly.  This is common in access-link
> self-congestion, where our stream(s) are the only significant user of
> bandwidth or if there are a very few users, and we've exceeded the capacity
> of the physical link.
>
> The inter-packet delay was likely increasing steadily leading up to the
> loss.  For example, if the other side is sending at 220Kbps at 20ms
> intervals with an access bottleneck (upstream at their side, or downstream
> at our side) of 200Kbps, and there's no other traffic on the link, then the
> packets would have been coming in about 22ms apart instead of 20ms.
>
> Let's assume a very short queue depth (no bufferbloat!) of 2 packets
> (40ms), and look what happens.  For whatever reason (packet loss, noisy
> signal to the filter, long RTT, cross-traffic that just went away, etc)
> let's assume that the system didn't react before the loss happened.
>
> When the packet is lost, there must have been 2 packets in the buffer.
>  The receiver will see packet N, N+1, and then N+3 and N+4. N will have
> been delayed around 38ms, N+1 around 40ms, N+3 about 22ms, and N+4 around
> 24ms  Pictorially  this would look like a sawtooth in the delay profile.
>
> If you naively input these into the filter, it will start to move down
> away from a 10% slope, and might indicate a flat queue-depth profile or
> perhaps even negative (draining) for a short time, when in fact we've been
> over-bandwidth the entire time.

Averaged over a sawtooth, the queue depth *is flat*, right?  So it is
rather a question of doing correct detection based on the output of the
Kalman filter, and not the Kalman filter not estimating what is happening
correctly.

Also, for the Kalman filter in RRTCC, the uncertainty of the samples will
increase in this case, which makes the filter slower.

>  This would depend on the exact filter type and usage, but this certainly
> violates some of the assumptions about error in a Kalman filter (for
> example, Gaussian distribution).  At least in this case, a Kalman filter
> might not be the optimum choice.
>
>
In the non-Gaussian case, the Kalman filter is the best linear (LMMSE)
state estimator, while in the Gaussian case it is the optimal (MMSE) state
estimator.

So while it is certainly not the optimum choice, it isn't obviously broken
either.

There are computationally more expensive filters such as ensemble Kalman or
particle filters, but it is not clear that they will improve significanly
over straight Kalman.  Some sort of simulation could sort this out.

Here is a paper where they did this for bi-Gaussian and tri-Gaussian
systems.  I am not a statistician, so I can only skim this and sort-of
kind-of see what is happening, but it could be an "inspiration" to further
work:

 http://www.control.isy.liu.se/~fredrik/reports/05ifac_hendeby.pdf

* Possible modifications:
> a) drop samples where there's a loss, since losses frequently perturb the
> delta of the next packet
>     This will reduce the 'noise' in the inputs to the filter, especially
> in simple cases like above.
>

Which samples in your example would you not feed to the Kalman filter?
Deciding when to start feeding samples into the Kalman filter is another
classification problem, or it can be done ad-hoc (ad-hoc ~=
non-probabilistic).

Also, reducing the 'noise' in the inputs to the filter makes it react
faster, which is not necessarily what you want in this particular case.

Note that there is an outlier filter in RRTCC.  A sample is capped at 3
std.dev from the (exponential decay) jitter.

(end of section 3.1)
http://tools.ietf.org/html/draft-alvestrand-rtcweb-congestion-00#section-3.1

> b) increase the "uncertainty" of the next packet dramatically in the input
> to the filter
>     This is an attempt to get the Kalman filter to weight the packet much
> lower, but not ignore it entirely.  I don't see this as very useful in
> practice.
>

I think this is conceptually cleaner.  What RRTCC is doing now is
calculating the exponential decay jitter of incoming packets.   In the
sawtooth-example, in stock RRTCC, the jitter variance will increase, which
means that the filter will become slower.  I think this is a good property.

The exponential decay jitter calculations means that this will not happen
on the *first loss*, but some time thereafter (i.e. the filter cannot react
instantaneously).

By using extra knowledge, we can probably build a more accurate model.  My
point is that it is not "obviously broken".  A model in the filter that
accounts for an absolute queue length while still retaining the current
features would seem to be non-linear though, and thus require a different
filter approach.

> c) look at whether there was a significant drop in delay following a loss,
> and use that as a separate indicator that we're over-bandwidth
>
>
There will only be a significant drop in delay if the queue is short (no
bufferbloat).  To know whether a drop is "significant", a notion of queue
length must be added to the model of the network.

The larger the queue, and the higher the bandwidth, the smaller drops in
delay is what we're searching for, and assuming that the queue sits egress
on the sender, additional jitter is added on the rest of the path.  Thus on
the receiver side, an ad-hoc filtering (non probabilistic) could easily be
far from optimal.

I think it would certainly be interesting to look at a model that includes
queue length as well as delay, but I think it should be probabilistic in
nature, probably inside the filter, not as ad-hoc filtering on the outside.

Alexander

> In my algorithm, I termed these "fishy" losses - they implied a queue
> suddenly shortened.  Too many (more than 1) of these in a short period of
> time meant I would decrease sending bandwidth by a significant extra
> amount, since the implication is not just delay, but a full queue, which I
> really want to get out of fast.  This is a form of modification 'c'.
>
> If we're not the only stream on the bottleneck, and we just got "unlucky"
> and had our packet get dropped, similar issues may occur, but the reduction
> in delay of the next packet may be less or much less, since we're typically
> metering in our packets at a frame rate.  With enough other traffic, the
> delay of the next packet will be roughly flat (or 40ms in the case above).
>  So this modification (c) is of most use in detecting the bandwidth/queue
> limit of a relatively idle link, typically an access (especially upstream)
> link.  It therefore may want to be combined with other mechanisms.
>
> If we see a direct loss without a significant reduction in delay, we need
> to assume it's either a congested link (not a physical layer limit we're
> hitting on an idle link) or it's a "random" loss. Losses on a congested
> link also indicate a full queue, and so even though the delay stopped
> increasing (or stayed on=average stable after you've hit the max and
> started dropped), you still want to decrease sending rate.  If it's a
> "random" loss (and not AQM), this causes a minor under-utilization, but for
> a truly congested link or if AQM is causing "random" drops, it's a signal
> we should reduce to try to start the queue draining.   To avoid
> over-reaction and 'hunting', it may be good to use some type of threshold,
> perhaps averaged or filtered.  If the loss rate reached ~5%, I'd make a
> mild cut in sending rate on top of any filter-suggested cuts, and if it
> reached ~10% I'd make a strong cut.
>
> (to be continued in another post)
>
> --
> Randell Jesup
> randell-ietf at jesup.org
>
> ______________________________**_________________
> Rtp-congestion mailing list
> Rtp-congestion at alvestrand.no
> http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/rtp-congestion/attachments/20120807/3c9c56a9/attachment-0001.html>