[RTW] peer-to-peer connection API thoughts

Fri Jan 7 14:03:28 CET 2011

Matt,
thanks for these thoughts!

On 01/05/11 03:01, Matthew Kaufman wrote:
> I spent the holidays thinking about various ways of dealing with the 
> connection API problem, and I thought I might put those thoughts out 
> for some discussion rather than trying to pick one and write it up as 
> a draft specification.
>
> There's obviously a few models one might build upon for how to 
> represent the peer-to-peer connections and bring them to life. The 
> WebSockets API ( http://dev.w3.org/html5/websockets/ ) is a solution 
> to the client-server connection problem, and the way Flash Player 
> extended its NetConnection and NetStream objects to support 
> peer-to-peer connectivity over RTMFP ( 
> http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStream.html 
> ) is another model. And of course 
> draft-alvestrand-dispatch-rtcweb-datagram ( 
> http://tools.ietf.org/html/draft-alvestrand-dispatch-rtcweb-datagram-00 ) 
> hints at how a URL scheme might be applied to some of this problem.
>
> I believe the following items are givens, though feedback is of course 
> welcome on these points as well:
>
> 1. When it is possible to establish a peer-to-peer connection for 
> media, that should be supported (and is probably preferred)
>
> 2. Supporting peer-to-peer connections for media in the face of NA(P)T 
> and firewalls will require UDP, and UDP hole-punching techniques (UDP 
> is also preferable for delay-sensitive media)
>
> 3. Enabling a browser to send UDP datagrams to script- or 
> server-designated IP addresses and ports is unsafe unless a handshake 
> is used to verify that the other end is a willing participant
>
> 4. STUN Connectivity Check probes and responses (with short-term 
> credentials) are a sufficiently strong handshake for the above (and 
> this then suggests that ICE or a variant thereof is a good approach 
> for the NAT traversal)
>
> 5. Given NAT and firewalls, it will also be necessary to support 
> relayed UDP traffic; that is to say, a traffic flow which is 
> peer-relay server-peer
>
> 6. Given NAT and firewalls, it will further be necessary to support 
> transporting the same media traffic over a TCP connection, for cases 
> where UDP is blocked. This could be a TCP tunnel of the UDP datagrams, 
> WebSockets, or something else.
I like this list!
>
> A few possible programming models to support this then present 
> themselves, though this is by no means an exhaustive list...
>
> A. A single connection object which can magically make everything work
>
> In this scenario the API is essentially "var connection = new 
> PeerConnection(destinationSpecifier);".
>
> The destinationSpecifier would contain enough information for the 
> PeerConnection object's implementation to: use ICE to attempt to open 
> a direct or relayed connection over UDP, or failing that, TCP. It 
> would contain the information necessary to communicate with an 
> intermediary server as the ICE negotiation proceeds as well (a "low 
> bandwidth" signaling channel is required between a pair of peers in 
> many cases for exchange of pertinent information before the media 
> connection can be established).

For very simple applications, this might be a sufficient API. But even 
if the object is magical, I think its API has to be quite a bit richer - 
in particular, when the state of the connection changes (failover to new 
connections, data doesn't flow any more, data suddenly flows a great 
deal faster than before) the changes have to be signalled back (using 
callbacks?) to the client Javascript so that appropriate user interface 
events can happen.

When I yank out my network cable, and don't have wireless, I expect an 
error message saying "connection lost"; I don't expect things to just 
freeze. So the extended API for establishing a connection would be 
something like

   var connection = new PeerConnection(destinationSpecifier, 
changeCallback, deathCallback, ....)
>
> B. A single connection object which does most of the magic, but with 
> the low-bandwidth signaling made external though events
>
> In this case, the API is the same as above, but rather than providing 
> in the destiniationSpecifier some sort of server address for the "low 
> bandwidth signaling" the connection object would instead fire events 
> containing (essentially opaque to the channel, but in an agreed 
> format) the data which needed to be transported to the other side. A 
> corresponding API on the far end's "listening" connection object would 
> allow for the data to be injected at the far end. The script 
> implementer would simply need to wire the event up to an existing 
> transport mechanism (HTTP, WebSockets).
I'm not familiar enough with this form of API to interpret that - what 
are these events, where would the events go, and who would handle them? 
Or are "events" Javascript terminology for what I'm used to calling 
"callbacks"?
>
> C. A connection object which does most of the magic, but with a parent 
> object that is the connection to the signaling mechanism
>
> This is the approach Flash Player has taken. The NetConnection is 
> opened to the rendezvous-handling server (which can also act as a 
> media relay), NetStreams are then opened either to that relaying 
> server or directly to other peers. These peer-to-peer NetStreams use 
> the open NetConnection for the peer setup signaling (determining the 
> far IP addresses, requesting NAT hole-punching, etc.)
Interesting - does this mean that one has an overarching object that can 
relate to multiple streams? Are there multiple streams active at the 
same time, and if so, what state do they share, and how are they different?

Pointers to specs would be welcome!
>
> D. Multiple connection objects
>
> In this case, the API is similar to case A for the UDP peer-to-peer 
> channel but a *different* API and connection object (perhaps an 
> extension of the existing WebSocket API) is used for the client-server 
> fallback cases. This requires the script to determine when a direct 
> connection cannot be made (which sometimes cannot be determined 
> without trying it out). Makes the programming model more complex, but 
> avoids duplication, as existing client-server models can be used for 
> all relayed (and server-based) media cases without there being two (or 
> more) different ways to move real-time media from a server to a 
> browser (and vice versa).
>
> E. Script-intensive (but flexible) components
>
> In this model, an API exists which exposes a nearly bare UDP socket, 
> but with permissions restricted such that traffic may not be sent 
> unless handshake probes are sent and received. Maximum flexibility, at 
> some significant cost in script complexity (though this could be 
> encapsulated into well-known, even open-source, libraries).
>
> An example API for this is would be:
>   var connection = new PeerConnection(ICEusernameString, 
> ICEpasswordString);
>      opens the local UDP port, nothing is permitted at this point
>   PeerConnection.testConnectivity(farSockaddrString, 
> farUsernameString, farPasswordString);
>      sends STUN connectivity checks
>   PeerConnection.onConnectivityTest(receivedSockaddrString, 
> usenameString);
>      event that fires when a connectivity test is received with valid 
> credentials. Browser also automatically generates the transaction 
> response.
>   PeerConnection.onConnectivitySuccess(receivedSockaddrString, 
> reflexiveAddrString, usernameString);
>      event that fires when the connectivity test transaction response 
> arrives... Browser then adds this destination to a whitelist of 
> addresses that media streams may be sent to
>   PeerConnection.openMediaSession(sockaddrString)
>      fails immediately if the address isn't on the whitelist, 
> otherwise starts a DTLS handshake with the specified address

No more comments on these - good starting thoughts!