[RTW] Review of draft-holmberg-rtcweb-ucreqs-00 (Web Real-Time Communication Use-cases and Requirements)

Thu Mar 10 13:57:03 CET 2011

Hi Harald,

>Section 5.2 lists a number of requirements, but doesn't link 
>them back to use cases. For some, this is obvious (they all 
>need them); for others, less so. In cases where only one or 
>two scenarios are the basis for the recommendation, linking 
>would be good.

We can take care of that in the next version.

----

>There's also some inconsistency between "MUST" and "must" - 
>are they intended to mean the same thing here?

They are intented to mean the same thing.

----

>Some comments:
> 
>F9: echo cancellation MUST be provided. Is this "provided" as 
>in "made available", or "provided" as "must be used"? There 
>are situations (headsets are one) where echo cancellation is 
>not needed.

"Made available".

I can modify the requirement to make it more clear.

----

>F13: The browser MUST be able to pan, mix and render several 
>concurrent video streams.
>"Render" is obvious, "mix" is a prerequisite for "render" for 
>n > # of speakers, but what is "pan", and why do we need it?

"Panning" is the capability to move the direction/point from 
where a user experience a sound to originate from.

If you have several incoming mono audio streams, and stereo (or better)
playout you could when playing the mono streams create the impression 
that they are coming from different directions in the room.

This enhances intelligibility in multiparty situations (motivated by
the multiparty use case). 

The W3C Audio XG (becoming a WG) has done some work that could be re-used.

There is an early Chrome/Safari implementation 
<http://chromium.googlecode.com/svn/trunk/samples/audio/index.html> 
of one API for this. There is also a Mozilla
implementation (using another API).

----

>F15: The browser MUST be able to process and mix sound 
>objects with audio streams.
>What is a "sound object", and in which scenario did this one occur?

A sound object is media that is retreived from another source than the 
established media stream(s) with the peer(s). It appears in the game example 
(section 4.4), where the sound of the tank might be generated locally,
but needs to be mixed with other media received over established media 
streams.

I can modify the requirement to make it more clear.

----

>F18: Which use case mandates the audio media format commonly 
>supported by existing telephony services (G.711?), and why is 
>this a MUST? Is it impossible (as opposed to just expensive) 
>to handle this requirement by a transcoding gateway?

The requirement is based on the Telephony use-case, and the wish to 
interoperate with legacy.

The requirement can of course be met by transcoding, but the idea is 
to avoid that. I thought that is the reason we have been trying to 
agree on a base codec in general.

----

>A5: The web application MUST be able to control the media 
>format (codec) to be used for the streams sent to a peer. I 
>think the MUST is that the sender and recipient need to be 
>able to find a common codec, if one exists; I'm not sure I 
>see a MUST for the webapp actually picking one.

First, the sender and recipient of need to be able to perform 
codec negotiation, in order to find the common codecs.

If the codec negotiation is handled by the web application 
(i.e. JavaScript based) the API must support this. 

If the codec negotiation is handled by the browser, then the app 
might not need to not have as much control. 

We try to cover that in the note associated with A5.

----

>In section 7.1, "security introduction", I think it would be 
>more accurate to say that "this section will in the future 
>describe"... there will be more text here as we get down to 
>the details. Offhand, stuff that should get into section 7.2 
>(browser):
> 
>- the browser has to provide mechanisms to assure that 
>streams are the ones the recipient intended to receive, and 
>signal to sender that it's ok to start sending media (this 
>translates to "STUN handshake" in currently-imagined implementations)
>- the browser has to ensure that sender doesn't begin to emit 
>media until the stream has been OKed ("stun handshake 
>completed" is the currently imagined implementation)
>- the browser has to ratelimit the # of attempts to negotiate 
>a stream, so that this itself isn't a DOS attack
>- the browser should ensure that recipient-specified limits 
>on send rate are not exceeded
>- it would be nice if the browser could keep some secrets 
>from the Javascript so that it's not possible for a malicious 
>webapp to use permission obtained from one interaction to get 
>authorization for sending media from somewhere else (this may 
>be impossible, however)

Thanks for the input! We'll use it in the next version.

----

>There will be more here. Good start!

Thanks for Your comments!

Regards,

Christer