[R-C] Strawman: Requirements

Wed Oct 19 12:09:45 CEST 2011

Hi,

Comments inline.

On Wed, Oct 19, 2011 at 01:06, Randell Jesup <randell-ietf at jesup.org> wrote:
> On 10/18/2011 10:07 AM, Harald Alvestrand wrote:
>> MEASUREMENT SETUP
>>
>> Imagine a system consisting of:
>>
>> - A sender A attached to a 100 Mbit LAN, network X
>> - A bandwidth constricted channel between this LAN and another LAN,
>> having bandwidth BW
>
> It may not matter here, but normally bandwidths are asymmetric; if we care
> about both directions (I think we don't in this scenario), we should specify
> directionality.  If we don't care, we should specify which direct we do care
> about.
>
>> - A recipient B also attached to a 100 Mbit LAN, network Y
>> - A second sender A' attached to network X, also sending to B
>> - Other equipment that generates traffic from X to Y over the channel
>
> Also note that RTT between A and B is important.
>
> If you want to simulate a more real-world situation (and I'm not 100%
> certain it's needed here), you need a third network Z to simulate traffic
> from A' to network nodes that aren't on network Y.  I.e. each side may
> compete with traffic that uses the access link, but doesn't go across the
> far-end access link.  (And, in fact, this is the primary interfering traffic
> if an access link is the bottleneck.)  In the case above, other than RTT and
> loss issues with feedback, it doesn't matter so long as you're only looking
> at one side.
>
> The reason it may matter is that the bottleneck link is usually at the
> upstream (at least for 1-on-1 communication), and subsequent traversals
> including the BW-constrained far-end downstream can affect the
> buffer-fullness-sensing.
>
> So, you need something more like:
>
>  - A sender A attached to a 100 Mbit LAN, network X
>  - A bandwidth constricted channel between this LAN and another LAN
>  (network Z), having bandwidth BW
>  - A recipient B also attached to a 100 Mbit LAN, network Y, which
>   is attached to network Z with bandwidth BW2
>  - A second sender A' attached to network X, also sending to B
>  - Other equipment that generates traffic between X to Z over
>   the channel
>  - Other equipment that generates traffic between Y to Z over
>   the channel
>
>> All measurements are taken by starting the flows, letting the mechanisms
>> adapt for 60 seconds, and then measuring bandwidth use over a 10-second
>> interval.
>
> Continuous flows are not a good model.  The "other equipment" will be a mix
> of continuous flows (bittorrent, youtube (kinda)), etc and heavily bursty
> flows (browsing).  We need to show it reacts well to a new TCP flow, to a
> flow going away, and to a burst of flows like a browser makes.
>

The problem I note with bursty traffic is how do you measure fairness
for these short bursty flows? short is probably (1-5s)
is fairness completion time of these short flows (they shouldn't take
more than 5 x times more time to complete, 5x is just a number I
chose)? or fair sharing of bottleneck BW (BW/N flows)?

> Also that it coexists with other rtcweb flows!  (which I note you include
> below)
>
>> This seems like the simplest reasonable system. We might have to specify
>> the buffer sizes on the border routers to get reproducible results.
>
This is an important value. Especially, in light of the buffer bloat discussion.

> And the buffer-handling protocols (RED, tail-drop, etc).  I assume tail-drop
> is by far the most common and what we'd model?
>

In NS2 simulations with cross-traffic we've got better results by
using RED routers instead of droptail ones (for TFRC, TMMBR, NADU).
However, as mentioned before AQM is not that common.

>> PROTOCOL FUNCTIONS ASSUMED AVAILABLE
>>
>> The sender and recipient have a connection mediated via ICE, protected
>> with SRTP.
>> The following functions need to be available:
>>
>> - A "timestamp" option for RTP packets as they leave the sender
>
> Are you referring to setting the RTP timestamp with the
> "close-to-on-the-wire" time?  I have issues with this...  If it's a header
> extension (as mentioned elsewhere) that's better, but a significant amount
> of bandwidth for some lower-bandwidth (audio-only) cases.
>
> The RTP-tfrc draft message looks like 12 bytes per RTP packet.  If we're
> running say Opus at ~11Kpbs @ 20ms, which including overhead on UDP uses
> ~27Kbps (not taking into account SRTP), this would add 4800bps to the
> stream, or around 18%.  Note that unlike TFRC, you need them on every
> packet, not just one per RTT.
>

One optimization is to drop the 3-byte RTT field. Just using the "send
timestamp (t_i)" field should be enough. Moreover, if there are other
RTP header extensions then 0xBEDE (4-byte) RTP header extension is
common to all of them. In which case the send timestamp adds just 5
bytes {ID, LEN, 4-byte TS}.

http://tools.ietf.org/html/rfc5450 provides a mechanism to use
relative transmission timestamps to relative RTP timestamps and uses a
3-bytes instead of 4 bytes for the timestamp. I am sure if we really
require such a timestamp there can be some optimizations done to save
a byte or two.

> Note that for a 128Kbps audio+video(30fps) call, the overhead would be
> ~7680bps, or 6%, which isn't bad.  At higher rates it goes down
> proportionally.  It's higher than I'd like.... but not problematically high.
>  We should evaluate how much it helps in practice versus sample-time
> timestamps.
>

I agree that we should evaluate if the precision is beneficial or not.

>> - A "bandwidth budget" RTCP message, signalling the total budget for
>> packets from A to B
>
> Ok.
>
>> REQUIRED RESULTS
>>
>> In each measurement scenario below, the video signal must be
>> reproducible, with at most 2% measured packet loss. (this is my proxy
>> for "bandwidth must have adapted").
>>
>> If A sends one video stream with max peak bandwidth < BW, and there is
>> no other traffic, A will send at the maximum rate for the video stream.
>> ("we must not trip ourselves up").
>
> Note: we need to send video and audio streams, but that might not be needed
> here.  But it's important in a way, in that both are adaptable but with very
> different scales.  We need both streams with very different bandwidths to
> adapt "reasonably" to stay in the boundary. Note that in real apps we may
> give the app a way to select the relative bandwidths and override our
> automatic mechanisms (i.e. rebalance within our overall limit).
>

There was some discussion at the beginning if the rate control should
be per flow, or combined? was there a consensus on that already or is
it something we still need to verify via simulations?

>> If A and A' each send two video streams with MPB > 1/2 BW, the traffic
>> measured will be at least 30% of BW for each video stream. ("we must
>> play well with ourselves")
>
> What's the measurement period and point, since these are likely to drift
> some and may take a short time to settle down?
>
>> If A sends a video stream with MPB > BW, and there is a TCP bulk
>> transfer stream from LAN X to LAN Y, the TCP bulk transfer stream must
>> get at least 30% of BW. ("be fair to TCP")
>>
>> If A sends a video stream with MPB > BW, and there is a TCP bulk
>> transfer stream from LAN X to LAN Y, the video stream must get at least
>> 30% of BW. ("don't let yourself be squeezed out")
>
> Given enough time (minutes), we *will* lose bandwidth-wise to a continuous
> TCP stream (as we know).
>

If there are 1 or 2 TCP streams (doing bulk transfer), a single RTP
stream can be made competitive, however as the number of streams
increase RTP loses out. In Google's draft, they ignore losses upto 2%.
In my own experiments, we were a bit more tolerant to inter-packet
delay.

>> Mark - this is almost completely off the top of my head. I believe that
>> failing any of these things will probably mean that we will have a
>> system that is hard to deploy in practice, because quality will just not
>> be good enough, or the harm to others will be too great.
>>
>> There are many ways of getting into trouble that won't be detected in
>> the simple setups below, but we may want to leave those aside until
>> we've shown that these "deceptively simple" cases can be made to work.
>>
>> We might think of this as "test driven development" - first define a
>> failure case we can measure, then test if we're failing in that
>> particular way - if no; proceed; if yes; back to the drawing board.
>
> Or analyze why it failed and update the test or criteria.
>

I agree with this methodology.

> We also need to cover our internal data streams, and multiple streams from
> the same client (as opposed to two separate clients as above).

Varun
-- 
http://www.netlab.tkk.fi/~varun/