[R-C] [ledbat] LEDBAT vs RTCWeb

Fri Apr 20 17:03:52 CEST 2012

On 04/20/2012 10:48 AM, Randell Jesup wrote:
> On 4/20/2012 7:55 AM, Mirja Kuehlewind wrote:
>> Hi Randell,
>>
>> I didn't follow the whole discussion but regarding LEDBAT we have a
>> TARGET
>> delay of max. 100ms. That means you can choose a smaller one. We've
>> chosen
>> 100ms as a max as there is an ITU recommendation that 150 ms delay is
>> acceptable for most user voice applications and we wanted for sure
>> stay below
>> that.
>
> I'm afraid you've mis-understood the 150ms number.  That's the "knee"
> in the curve for mouth-to-ear delay, not congestion delay or even
> end-to-end packet delay.  (And it's more complicated than that; the
> 150ms number is dependent on the amount of echo from the far end -
> with high echo (poor/no ECs), it can be smaller.)  And you can get
> asymmetric affects where delays in each direction are unequal.
>
> For VoIP communication, the delay budget roughly looks like this:
>
> frame size    - typ 20-30ms -- you have to buffer up one packet to send
> echo cancellation - typ 1-3ms?
> encoder delay - typ 1-2ms?
> algorithmic delay - typ 0-5ms
> packetization, output queuing, etc - 0-10ms (typically low/nil)
> unloaded (no local-loop queuing) transit time: typically 20-100ms)
> queuing delay: ?
> jitter buffer depth - typically 20-60ms
> decoder, time scale modifier (adaptive jitter buffer): 0-2ms?
> rebuffering into audio frames for drivers: typ 1/2 frame size (5-10ms)
> Other random signal processing: 0-2ms?
> output device driver buffering (and reframing in OS frame size chunks
> - 16ms typ on Linux for 8KHz audio) - typ 10ms?  longer on some OS's!!!
> hardware buffers
>
> This is kind of abstract and people could argue the numbers or list,
> but it's to give you an idea that queue depth is far from the only
> item (though it's the most variable one).  Almost all of these are
> fixed and/or small, except transit time, queuing delay and jitter
> buffer depth (indirectly affected by queuing).
>
> Take off the fixed/small items from 150ms, and you are probably left
> 80-100ms (if you're lucky, 50-80 if you're not) - for transit, jitter
> buffer and queuing (and video calls can be a bit worse, with longer
> frame lengths, more jitter and often longer hardware queues).  So, to
> stay under 150ms on local hops (with a fast access link at both ends),
> you need moderate jitter and can probably handle some static queuing
> (<25-50ms).  For longer routes and/or slower access links (DSL),
> there's basically no budget for standing queues, especially as jitter
> is typically higher.
>
> I guarantee you any VoIP engineer seeing "100ms queuing delay" has
> their heart sink about conversational quality.  Yes, you can have
> calls.  Yes, they *will* suffer "typical" awkward-pause/talkover. 
> You'll probably generally end up in the 200-300ms range mouth-ear,
> which isn't at the ~400-500ms "What do you think? over!" walkie-talkie
> level, but is uncomfortable.
>
> And that's assuming old-style inflexible VoIP UDP streams (G.711,
> G.722, G.729 (ugh).  Once you add video with BW adaptation or adaptive
> audio codecs, interacting with LEDBAT gets painful if the VoIP stream
> uses a delay-sensing protocol (and it really, really wants to).
>
>> If you choose a delay-based congestion control I don't think your
>> problem is
>> LEDBAT but standard loss-based TCP that will frequently fill up the
>> queue
>> completely.
>
> Delay-based can't "beat" a loss-based TCP flow that's long enough,
> that's true, but luckily most TCP flows are relatively short and/or
> bursty, especially across the access link.
>
>> Maybe you don't want to look at the total queuing delay but at the
>> changes in
>> queuing delay...? LEDBAT will keep the delay constant.
>
> RRTCC and similar algorithms do not use OWD estimates, and so are less
> sensitive to mis-measurement of the base delay (which from the LEDBAT
> simulations can cause problems).  RRTCC works entirely from deltas in
> inter-packet delay to determine if the queue is growing or shrinking
> (or stable).  After a queuing event is observed (growing queue enough
> to give a signal from the filter), it drops bandwidth and tries to
> stay down (not probe for extra bandwidth) until the queue has drained
> and is once again stable.  This allows it to generally make close to
> full use of bandwidth available with close to 0-length queues.  It
> does generally value low queues over 100% bandwidth efficiency.
>
>
Thank you very much for all of this: I've been very aware of all of
these effects (I did an audio server in the early '90's called AF a few
may remember).

But espousing it in my mail was going to obscure the point I was really
trying to drive home: that the queueing delays are so huge, that even
when *ignoring* all the rest of the latency budget, that we really have
to fix the queueing delays, as they by themselves are badly
unacceptable.  We have to do away with the uncontrolled, fixed (usually
grossly bloated) sized, single queued, edge devices currently in the
Internet.  And that implies both fancy queuing and AQM that can handle
(often hugely variable) bandwidths we now see in the edge.

My personal targets for queuing delays for RT sensitive traffic are to
get them below 10ms at the edge (presuming the broadband technology
doesn't also get you: e.g. I measure about 9ms of latency on my Comcast
cable link).  This is particularly interesting given the 4ms
quantisation in 802.11.

Fundamentally, latency you *never* want to give away: you can *never*
get it back, and there has to be time for other processes to "do their
thing" (e.g. echo cancellation, jitter buffer, etc.).  Burning *any*
latency unnecessarily just makes everything else harder....
                                   - Jim