[R-C] Strawman: Requirements

Wed Oct 19 00:06:52 CEST 2011

On 10/18/2011 10:07 AM, Harald Alvestrand wrote:
> I think that in order to get off ground zero here, both in
> implementation, testing and navigating the tortuous path towards
> standards-track adoption, we should throw out some requirements.

Thanks Harald; I was hoping to get some out there, but was busy with 
make system fun.... :-(

> And in the spirit of rushing in where angels fear to tread, I'm going to
> throw out some.
>
> MEASUREMENT SETUP
>
> Imagine a system consisting of:
>
> - A sender A attached to a 100 Mbit LAN, network X
> - A bandwidth constricted channel between this LAN and another LAN,
> having bandwidth BW

It may not matter here, but normally bandwidths are asymmetric; if we 
care about both directions (I think we don't in this scenario), we 
should specify directionality.  If we don't care, we should specify 
which direct we do care about.

> - A recipient B also attached to a 100 Mbit LAN, network Y
> - A second sender A' attached to network X, also sending to B
> - Other equipment that generates traffic from X to Y over the channel

Also note that RTT between A and B is important.

If you want to simulate a more real-world situation (and I'm not 100% 
certain it's needed here), you need a third network Z to simulate 
traffic from A' to network nodes that aren't on network Y.  I.e. each 
side may compete with traffic that uses the access link, but doesn't go 
across the far-end access link.  (And, in fact, this is the primary 
interfering traffic if an access link is the bottleneck.)  In the case 
above, other than RTT and loss issues with feedback, it doesn't matter 
so long as you're only looking at one side.

The reason it may matter is that the bottleneck link is usually at the 
upstream (at least for 1-on-1 communication), and subsequent traversals 
including the BW-constrained far-end downstream can affect the 
buffer-fullness-sensing.

So, you need something more like:

  - A sender A attached to a 100 Mbit LAN, network X
  - A bandwidth constricted channel between this LAN and another LAN
   (network Z), having bandwidth BW
  - A recipient B also attached to a 100 Mbit LAN, network Y, which
    is attached to network Z with bandwidth BW2
  - A second sender A' attached to network X, also sending to B
  - Other equipment that generates traffic between X to Z over
    the channel
  - Other equipment that generates traffic between Y to Z over
    the channel

> All measurements are taken by starting the flows, letting the mechanisms
> adapt for 60 seconds, and then measuring bandwidth use over a 10-second
> interval.

Continuous flows are not a good model.  The "other equipment" will be a 
mix of continuous flows (bittorrent, youtube (kinda)), etc and heavily 
bursty flows (browsing).  We need to show it reacts well to a new TCP 
flow, to a flow going away, and to a burst of flows like a browser makes.

Also that it coexists with other rtcweb flows!  (which I note you 
include below)

> This seems like the simplest reasonable system. We might have to specify
> the buffer sizes on the border routers to get reproducible results.

And the buffer-handling protocols (RED, tail-drop, etc).  I assume 
tail-drop is by far the most common and what we'd model?

> PROTOCOL FUNCTIONS ASSUMED AVAILABLE
>
> The sender and recipient have a connection mediated via ICE, protected
> with SRTP.
> The following functions need to be available:
>
> - A "timestamp" option for RTP packets as they leave the sender

Are you referring to setting the RTP timestamp with the 
"close-to-on-the-wire" time?  I have issues with this...  If it's a 
header extension (as mentioned elsewhere) that's better, but a 
significant amount of bandwidth for some lower-bandwidth (audio-only) cases.

The RTP-tfrc draft message looks like 12 bytes per RTP packet.  If we're 
running say Opus at ~11Kpbs @ 20ms, which including overhead on UDP uses 
~27Kbps (not taking into account SRTP), this would add 4800bps to the 
stream, or around 18%.  Note that unlike TFRC, you need them on every 
packet, not just one per RTT.

Note that for a 128Kbps audio+video(30fps) call, the overhead would be 
~7680bps, or 6%, which isn't bad.  At higher rates it goes down 
proportionally.  It's higher than I'd like.... but not problematically 
high.  We should evaluate how much it helps in practice versus 
sample-time timestamps.

> - A "bandwidth budget" RTCP message, signalling the total budget for
> packets from A to B

Ok.

> REQUIRED RESULTS
>
> In each measurement scenario below, the video signal must be
> reproducible, with at most 2% measured packet loss. (this is my proxy
> for "bandwidth must have adapted").
>
> If A sends one video stream with max peak bandwidth < BW, and there is
> no other traffic, A will send at the maximum rate for the video stream.
> ("we must not trip ourselves up").

Note: we need to send video and audio streams, but that might not be 
needed here.  But it's important in a way, in that both are adaptable 
but with very different scales.  We need both streams with very 
different bandwidths to adapt "reasonably" to stay in the boundary. 
Note that in real apps we may give the app a way to select the relative 
bandwidths and override our automatic mechanisms (i.e. rebalance within 
our overall limit).

> If A and A' each send two video streams with MPB > 1/2 BW, the traffic
> measured will be at least 30% of BW for each video stream. ("we must
> play well with ourselves")

What's the measurement period and point, since these are likely to drift 
some and may take a short time to settle down?

> If A sends a video stream with MPB > BW, and there is a TCP bulk
> transfer stream from LAN X to LAN Y, the TCP bulk transfer stream must
> get at least 30% of BW. ("be fair to TCP")
>
> If A sends a video stream with MPB > BW, and there is a TCP bulk
> transfer stream from LAN X to LAN Y, the video stream must get at least
> 30% of BW. ("don't let yourself be squeezed out")

Given enough time (minutes), we *will* lose bandwidth-wise to a 
continuous TCP stream (as we know).

> Mark - this is almost completely off the top of my head. I believe that
> failing any of these things will probably mean that we will have a
> system that is hard to deploy in practice, because quality will just not
> be good enough, or the harm to others will be too great.
>
> There are many ways of getting into trouble that won't be detected in
> the simple setups below, but we may want to leave those aside until
> we've shown that these "deceptively simple" cases can be made to work.
>
> We might think of this as "test driven development" - first define a
> failure case we can measure, then test if we're failing in that
> particular way - if no; proceed; if yes; back to the drawing board.

Or analyze why it failed and update the test or criteria.

We also need to cover our internal data streams, and multiple streams 
from the same client (as opposed to two separate clients as above).

I'll write more (and edit this more), but I'm out of time for right now.

-- 
Randell Jesup
randell-ietf at jesup.org