[RTW] Charter proposal: The activity hitherto known as "RTC-WEB at IETF"

Mon Jan 10 20:19:53 CET 2011

Hi Harald,

Some replies in-line.

On Fri, Jan 7, 2011 at 5:30 AM, Harald Alvestrand <harald at alvestrand.no> wrote:
> On 01/06/11 19:43, Ted Hardie wrote:
>>
>> Hi Harald,
>>
>> Thanks for putting this together; some discussion in-line.
>>
>> On Thu, Jan 6, 2011 at 3:53 AM, Harald Alvestrand<harald at alvestrand.no>
>>  wrote:
>>>
>>> This is the first of 3 messages going to the DISPATCH list (in the hope
>>> of
>>> keeping discussions somewhat organized).
>>>
>>> This is the draft of a charter for an IETF working group to consider the
>>> subject area of "Real time communication in the Web browser platform".
>>> This
>>> is one of a paired set of activities, the other one being a W3C activity
>>> (either within an existing WG or in a new WG) that defines APIs to this
>>> functionality.
>>>
>>> The two other messages will contain the W3C proposed charter and a
>>> kickoff
>>> for what's usually the most distracting topic in any such discussion: The
>>> name of the group.
>>> Without further ado:
>>>
>>> -------------------------------------
>>>
>>> Version: 2
>>>
>>> Possible Names:
>>> <This space deliberately left blank for later discussion>
>>>
>>> Body:
>>>
>>> Many implementations have been made that use a Web browser to support
>>> interactive communications directly between users including voice, video,
>>> collaboration and gaming, but until now, such applications have required
>>> the
>>> installation of nonstandard plugins and browser extensions. There is a
>>> desire to standardize such functionality, so that this type of
>>> application
>>> can be run in any compatible browser.
>>>
>> In at least some of the contexts identified above, there is a
>> long-lived identifier
>> associated with the individuals who will have interactive
>> communications; that is,
>> there is a context-specific presence architecture in addition to the
>> implementation-specific
>> real-time communications.  Though the text below occasionally says
>> "users",
>> there does not seem to be any work being defined that would touch on this.
>> I see that in the last sentence you explicitly rule it out of scope.
>> Without this,
>> this seems to be limited to an architecture where a mediating server
>> provides
>> the initial signaling path.  If that is the scope, I think that should be
>> made
>> explicit as a design statement, not inferred from the lack of presence.
>
> I think what you mean is that at this level, we're not taking a position on
> what an "user" is, or whether the concept of "user" even has meaning for the
> application (think chatroulette for one example of an application where
> there are no long-lived user identifiers).
>
> I would certainly agree that this is a reasonable restriction of our work,
> and would welcome suggested text on how to write that into the charter.
>>>

How about:

Many implementations have been made that use a Web browser to support
direct, interactive communications, including voice, video, collaboration, and
gaming.  In these implementations, the web server acts as the signaling
path between these applications, using locally significant identifiers to set
up the association.  Up till now, such applications have typically
required the installation of plugins or non-standard browser extensions.
There is a desire to standardize this functionality, so that this type
of application
can be run in any compatible browser.

>>> Traditionally, the W3C has defined API and markup languages such as HTML
>>> that work in conjunction with with the IETF over the wire protocols such
>>> as
>>> HTTP to allow web browsers to display media that does not have real time
>>> interactive constraints with another human.
>>>
>>> The W3C and IETF plan to collaborate together in their traditional way to
>>> meet the evolving needs of browsers. Specifically the IETF will provide a
>>> set of on the wire protocols, including RTP, to meet the needs on
>>> interactive communications, and the W3C will define the API and markup to
>>> allow web application developers to control the on the wire protocols.
>>> This
>>> will allow application developers  to write applications that run in a
>>> browser and facilitate interactive communications between users for voice
>>> and video communications, collaboration, and gaming.
>>>
>>> This working group will select and define a minimal set of protocols that
>>> will enable browsers to:
>>>
>>> * have interactive real time voice and video between users using RTP
>>> * interoperate with compatible voice and video systems that are not web
>>> based
>>> * support direct flows of non RTP application data between browsers for
>>> collaboration and gaming applications
>>>
>> Okay "direct flows of non-RTP application data" goes beyond scope creep;
>> to satisfy this completely, we are talking an end-point-to-end-point
>> tunnel,
>> which will run afoul of a lot of folks who dearly love packet inspection.
>>  I
>> would say that it would be best to first tackle the voice/video bits and
>> then
>> tackle this problem.  Re-charter the WG to cover this, in other words,
>> after
>> the first bit is done.
>>
>> I also note that you're using "between browsers", rather than "among
>> browsers".
>> Is it intended that this facility should allow for multi-party
>> communication?
>> Leaving aside the non-RTP issues, that would add floor control, media
>> mixing,
>> etc. to the task list.  Again, I think it should either be explicitly
>> in-scope or out-of-scope.
>
> I did not intend "between" to indicate that there were only two parties, but
> think that we're building point-to-point communications, not N-to-N.
>
> In one interpretation of the distinction .... I think we're standardizing
> point-to-point communications at this time, since all the proven-viable
> multipoint communication applications have been built out of point-to-point
> links rather than using underlying multipoint technologies like multicast.
>
> In another interpretation ... I think we should limit ourselves to providing
> a toolbox. I think floor control (except for possibly providing the
> signalling channels floor control needs) is out of scope; MAITAI may be a
> better place to discuss media mixing.
>

So, maybe this would be clearer if I asked it in a different way.  Are
you thinking of something like websockets run between two peers?
Or are you thinking of layered tunnel model?

>>> Fortunately very little development of new protocol at IETF is required
>>> for
>>> this, only selection of existing protocols and selection of minimum
>>> capabilities to ensure interoperability. The following protocols are
>>> candidates for including in the profile set:
>>>
>>> 1) RTP/ RTCP
>>>
>>> 2) a baseline audio codec for high quality interactive audio. Opus
>>> will be considered as one of the candidates
>>>
>>> 3) a baseline audio codec for PSTN interoperability. G.711 and iLBC
>>> will be considered
>>>
>>> 4) a baseline video codec. H.264 and VP8 will be considered
>>>
>>> 5) Diffserv based QoS
>>>
>>> 6) NAT traversal using ICE
>>>
>> The ICE spec is clear that the NAT traversal it provides does not
>> help establish the signaling between agents.  For this browser-to-browser
>> mechanism, if we are limiting this to situations where a mediating web
>> server
>> provides the initial signaling path, that's okay.  If there is a
>> different model, though,
>> we may need additional tools here.
>
> If we can make the mediating-web-server model work, I think we have success;
> if we can't make it work, we have a failure. So obviously that's my first
> priority. What text changes do you think are needed?

If the text I suggested above clarifying that this is mediating-web-server
works for you, I don't think additionally text is needed here.

>>>
>>> 7) RFC 4833 based DTMF transport
>>>
>>> 8) RFC 4574 based Label support for identifying streams purpose
>>>
>>> 9) Secure RTP and keying
>>>
>>> 10) support for IPv4, IPv6 and dual stack browsers
>>>
>>> The working group will cooperate closely with the W3C activity that
>>> specifies a semantic level API that allows the control and manipulation
>>> of
>>> all the functionality above. In addition, the API needs to communicate
>>> state
>>> information and events about what is happening in the browser that to
>>> applications running in the browser.
>>
>> I think the sentence above has some extra words; is the browser talking
>> to application running in the browser?  Can it be re-phrased?
>
> I think there's an extra "that" - when an event happens at the
> communications interface, this event needs to be propagated over the API to
> the Javascript that is running in a Web page - that Web page is what I've
> been thinking of as "the application". (There are other moves around that
> want to run "web pages" in other contexts than a currently open tab in a
> broswer, but for the moment, "a tab in a browser", or even "a set of tabs in
> a browser" is a reasonable synonym for "application", I think).
>>
>> These events and state need to include
>>>
>>> information such as: receiving RFC 4833 DTMF, RTP and RTCP statistics,
>>> state
>>> of DTLS/SRTP,  and signalling state.
>>>
>>> The following topics will be out of scope for the initial phase of the WG
>>> but could be added after a recharter: RTSP, RSVP, NSIS, LOST,
>>> Geolocation,
>>> IM&  Presence, NSIS, Resource Priority,
>>>
>>> Milestones:
>>>
>>> February 2011 Candidate "sample" documents circulated to DISPATCH
>>>
>>> March 2011 BOF at IETF Prague
>>>
>>> April 2011 WG charter approved by IESG. Chosen document sets adopted as
>>> WG
>>> documents
>>>
>>> May 2011 Functionality to include and main alternative protocols
>>> identified
>>>
>>> July 2011 IETF meeting
>>>
>>> Aug 2011 Draft with text reflecting agreement of what the protocol set
>>> should be
>>>
>>> Nov 2010 Documentation specifying mapping of protocol functionality to
>>> W3C-specified API produced
>>>
>>> Dec 2011 Protocol set specification to IESG
>>>
>>> April 2012 API mapping document to IESG
>>>
>> Pretty aggressive, given the number of moving parts.  It's also not
>> clear why the November 2011 doc (I assume 2010 is a typo) is
>> done in the IETF, rather than the W3C.  Or is it joint work?
>
> More or less done this way at random. I think of this as joint work;
> documents may move between the organizations as we progress, but I'm not
> sure how the W3C document model works yet.
>
>
I think that we need to be a bit more careful on the
dependencies, as it has not been my experience that joint
work has resulted in a congruent set of people working on them
in the two organizations.   Can we make the deliverable for
the November 2011 doc the release of a draft, with a note it
is intended as a substrate to the W3C work?  That would
make the W3C work have a dependency on this draft delivery.

regards,

Ted Hardie