[R-C] Congestion Control BOF

Tue Oct 11 04:08:24 CEST 2011

On 10/8/2011 11:29 PM, Justin Uberti wrote:
>
>
> On Sat, Oct 8, 2011 at 10:39 PM, Randell Jesup <randell-ietf at jesup.org
> <mailto:randell-ietf at jesup.org>> wrote:
>
>     Well, I'm probably being overly-worried about processing delays (and
>     in particular differing delays for audio and video).  Let's say
>     audio gets sampled at X, and (ignoring other processing steps) takes
>     1ms to encode.  It gets to the wire at X + <other steps> + 1.  Lets
>     say video is also sampled at X, and (ignoring other processing
>     steps) takes 10ms to encode.  It gets to the wire at X + <other
>     steps> + 10.  So we've added a 9ms offset to all our A/V sync, and
>     in this case it's in the "wrong" direction (people are more
>     sensitive to early-audio than early-video). And if "other steps" on
>     each side don't balance (and they may not), it could be worse.  I
>     also worry more that in a browser, with no access to true RT_PRI
>     processing, the delays could be significantly variable (we get
>     preempted by some other process/thread for 10 or 20ms, etc).  Also,
>     if the receiver isn't careful it could be tricked into skipping
>     frames it should be displaying due to jitter in the packet-to-packet
>     timestamps.
>
>
>     So perhaps I'm not being overly-worried.  I realize that I'm trading
>     off accuracy in bandwidth estimation (or if you prefer, reaction
>     speed) for ease in getting a consistent framerate and best-possible
>     A/V sync.
>     In a perfect world we'd record the sampling time and the delta until
>     it was submitted to sendto(), so we'd have both.  (You could use a
>     header extension to do that).
>
>
> There's a lot more going on here. The algorithmic delays for audio and
> video will often be different, the capture delays perhaps wildly so. In
> addition, you won't want to just dump the video directly onto the wire -
> typically it will be leaked out over some interval to avoid bandwidth
> spikes, and the audio will have to maintain some jitter buffer to
> prevent underrun - so I think the encoding processing deltas will be
> nominal compared to the other delays in the pipeline.

Sure - though you have the sampling time of the audio and video, and if 
you do your job right on the playback side, they'll be rock-solid synced 
(and that can be done even if there's static drift between the audio and 
video timestamp clocks).  So long as you don't use time-on-wire 
timestamps...

> I think this also does illustrate why having "time-on-wire" timestamping
> is really useful for increasing estimation accuracy :-)

BTW, I was serious when I said you could improve on this with an RTP 
header extension with "time-on-the-wire" delta from sample time. 
However, I don't think we need this here.  As it would be totally 
optional and ignored, that could be added later.

-- 
Randell Jesup
randell-ietf at jesup.org