[RTW] Charter proposal: The activity hitherto known as "RTC-WEB at IETF"

Fri Jan 7 14:30:39 CET 2011

On 01/06/11 19:43, Ted Hardie wrote:
> Hi Harald,
>
> Thanks for putting this together; some discussion in-line.
>
> On Thu, Jan 6, 2011 at 3:53 AM, Harald Alvestrand<harald at alvestrand.no>  wrote:
>> This is the first of 3 messages going to the DISPATCH list (in the hope of
>> keeping discussions somewhat organized).
>>
>> This is the draft of a charter for an IETF working group to consider the
>> subject area of "Real time communication in the Web browser platform". This
>> is one of a paired set of activities, the other one being a W3C activity
>> (either within an existing WG or in a new WG) that defines APIs to this
>> functionality.
>>
>> The two other messages will contain the W3C proposed charter and a kickoff
>> for what's usually the most distracting topic in any such discussion: The
>> name of the group.
>> Without further ado:
>>
>> -------------------------------------
>>
>> Version: 2
>>
>> Possible Names:
>> <This space deliberately left blank for later discussion>
>>
>> Body:
>>
>> Many implementations have been made that use a Web browser to support
>> interactive communications directly between users including voice, video,
>> collaboration and gaming, but until now, such applications have required the
>> installation of nonstandard plugins and browser extensions. There is a
>> desire to standardize such functionality, so that this type of application
>> can be run in any compatible browser.
>>
> In at least some of the contexts identified above, there is a
> long-lived identifier
> associated with the individuals who will have interactive
> communications; that is,
> there is a context-specific presence architecture in addition to the
> implementation-specific
> real-time communications.  Though the text below occasionally says "users",
> there does not seem to be any work being defined that would touch on this.
> I see that in the last sentence you explicitly rule it out of scope.
> Without this,
> this seems to be limited to an architecture where a mediating server provides
> the initial signaling path.  If that is the scope, I think that should be made
> explicit as a design statement, not inferred from the lack of presence.
I think what you mean is that at this level, we're not taking a position 
on what an "user" is, or whether the concept of "user" even has meaning 
for the application (think chatroulette for one example of an 
application where there are no long-lived user identifiers).

I would certainly agree that this is a reasonable restriction of our 
work, and would welcome suggested text on how to write that into the 
charter.
>> Traditionally, the W3C has defined API and markup languages such as HTML
>> that work in conjunction with with the IETF over the wire protocols such as
>> HTTP to allow web browsers to display media that does not have real time
>> interactive constraints with another human.
>>
>> The W3C and IETF plan to collaborate together in their traditional way to
>> meet the evolving needs of browsers. Specifically the IETF will provide a
>> set of on the wire protocols, including RTP, to meet the needs on
>> interactive communications, and the W3C will define the API and markup to
>> allow web application developers to control the on the wire protocols. This
>> will allow application developers  to write applications that run in a
>> browser and facilitate interactive communications between users for voice
>> and video communications, collaboration, and gaming.
>>
>> This working group will select and define a minimal set of protocols that
>> will enable browsers to:
>>
>> * have interactive real time voice and video between users using RTP
>> * interoperate with compatible voice and video systems that are not web
>> based
>> * support direct flows of non RTP application data between browsers for
>> collaboration and gaming applications
>>
> Okay "direct flows of non-RTP application data" goes beyond scope creep;
> to satisfy this completely, we are talking an end-point-to-end-point tunnel,
> which will run afoul of a lot of folks who dearly love packet inspection.  I
> would say that it would be best to first tackle the voice/video bits and then
> tackle this problem.  Re-charter the WG to cover this, in other words, after
> the first bit is done.
>
> I also note that you're using "between browsers", rather than "among browsers".
> Is it intended that this facility should allow for multi-party communication?
> Leaving aside the non-RTP issues, that would add floor control, media mixing,
> etc. to the task list.  Again, I think it should either be explicitly
> in-scope or out-of-scope.
I did not intend "between" to indicate that there were only two parties, 
but think that we're building point-to-point communications, not N-to-N.

In one interpretation of the distinction .... I think we're 
standardizing point-to-point communications at this time, since all the 
proven-viable multipoint communication applications have been built out 
of point-to-point links rather than using underlying multipoint 
technologies like multicast.

In another interpretation ... I think we should limit ourselves to 
providing a toolbox. I think floor control (except for possibly 
providing the signalling channels floor control needs) is out of scope; 
MAITAI may be a better place to discuss media mixing.

>> Fortunately very little development of new protocol at IETF is required for
>> this, only selection of existing protocols and selection of minimum
>> capabilities to ensure interoperability. The following protocols are
>> candidates for including in the profile set:
>>
>> 1) RTP/ RTCP
>>
>> 2) a baseline audio codec for high quality interactive audio. Opus
>> will be considered as one of the candidates
>>
>> 3) a baseline audio codec for PSTN interoperability. G.711 and iLBC
>> will be considered
>>
>> 4) a baseline video codec. H.264 and VP8 will be considered
>>
>> 5) Diffserv based QoS
>>
>> 6) NAT traversal using ICE
>>
> The ICE spec is clear that the NAT traversal it provides does not
> help establish the signaling between agents.  For this browser-to-browser
> mechanism, if we are limiting this to situations where a mediating web server
> provides the initial signaling path, that's okay.  If there is a
> different model, though,
> we may need additional tools here.
If we can make the mediating-web-server model work, I think we have 
success; if we can't make it work, we have a failure. So obviously 
that's my first priority. What text changes do you think are needed?
>> 7) RFC 4833 based DTMF transport
>>
>> 8) RFC 4574 based Label support for identifying streams purpose
>>
>> 9) Secure RTP and keying
>>
>> 10) support for IPv4, IPv6 and dual stack browsers
>>
>> The working group will cooperate closely with the W3C activity that
>> specifies a semantic level API that allows the control and manipulation of
>> all the functionality above. In addition, the API needs to communicate state
>> information and events about what is happening in the browser that to
>> applications running in the browser.
> I think the sentence above has some extra words; is the browser talking
> to application running in the browser?  Can it be re-phrased?
I think there's an extra "that" - when an event happens at the 
communications interface, this event needs to be propagated over the API 
to the Javascript that is running in a Web page - that Web page is what 
I've been thinking of as "the application". (There are other moves 
around that want to run "web pages" in other contexts than a currently 
open tab in a broswer, but for the moment, "a tab in a browser", or even 
"a set of tabs in a browser" is a reasonable synonym for "application", 
I think).
> These events and state need to include
>> information such as: receiving RFC 4833 DTMF, RTP and RTCP statistics, state
>> of DTLS/SRTP,  and signalling state.
>>
>> The following topics will be out of scope for the initial phase of the WG
>> but could be added after a recharter: RTSP, RSVP, NSIS, LOST, Geolocation,
>> IM&  Presence, NSIS, Resource Priority,
>>
>> Milestones:
>>
>> February 2011 Candidate "sample" documents circulated to DISPATCH
>>
>> March 2011 BOF at IETF Prague
>>
>> April 2011 WG charter approved by IESG. Chosen document sets adopted as WG
>> documents
>>
>> May 2011 Functionality to include and main alternative protocols identified
>>
>> July 2011 IETF meeting
>>
>> Aug 2011 Draft with text reflecting agreement of what the protocol set
>> should be
>>
>> Nov 2010 Documentation specifying mapping of protocol functionality to
>> W3C-specified API produced
>>
>> Dec 2011 Protocol set specification to IESG
>>
>> April 2012 API mapping document to IESG
>>
> Pretty aggressive, given the number of moving parts.  It's also not
> clear why the November 2011 doc (I assume 2010 is a typo) is
> done in the IETF, rather than the W3C.  Or is it joint work?
More or less done this way at random. I think of this as joint work; 
documents may move between the organizations as we progress, but I'm not 
sure how the W3C document model works yet.