[RTW] peer-to-peer connection API thoughts

Wed Jan 5 03:01:13 CET 2011

I spent the holidays thinking about various ways of dealing with the 
connection API problem, and I thought I might put those thoughts out for 
some discussion rather than trying to pick one and write it up as a 
draft specification.

There's obviously a few models one might build upon for how to represent 
the peer-to-peer connections and bring them to life. The WebSockets API 
( http://dev.w3.org/html5/websockets/ ) is a solution to the 
client-server connection problem, and the way Flash Player extended its 
NetConnection and NetStream objects to support peer-to-peer connectivity 
over RTMFP ( 
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/flash/net/NetStream.html 
) is another model. And of course 
draft-alvestrand-dispatch-rtcweb-datagram ( 
http://tools.ietf.org/html/draft-alvestrand-dispatch-rtcweb-datagram-00 
) hints at how a URL scheme might be applied to some of this problem.

I believe the following items are givens, though feedback is of course 
welcome on these points as well:

1. When it is possible to establish a peer-to-peer connection for media, 
that should be supported (and is probably preferred)

2. Supporting peer-to-peer connections for media in the face of NA(P)T 
and firewalls will require UDP, and UDP hole-punching techniques (UDP is 
also preferable for delay-sensitive media)

3. Enabling a browser to send UDP datagrams to script- or 
server-designated IP addresses and ports is unsafe unless a handshake is 
used to verify that the other end is a willing participant

4. STUN Connectivity Check probes and responses (with short-term 
credentials) are a sufficiently strong handshake for the above (and this 
then suggests that ICE or a variant thereof is a good approach for the 
NAT traversal)

5. Given NAT and firewalls, it will also be necessary to support relayed 
UDP traffic; that is to say, a traffic flow which is peer-relay server-peer

6. Given NAT and firewalls, it will further be necessary to support 
transporting the same media traffic over a TCP connection, for cases 
where UDP is blocked. This could be a TCP tunnel of the UDP datagrams, 
WebSockets, or something else.

-----

A few possible programming models to support this then present 
themselves, though this is by no means an exhaustive list...

A. A single connection object which can magically make everything work

In this scenario the API is essentially "var connection = new 
PeerConnection(destinationSpecifier);".

The destinationSpecifier would contain enough information for the 
PeerConnection object's implementation to: use ICE to attempt to open a 
direct or relayed connection over UDP, or failing that, TCP. It would 
contain the information necessary to communicate with an intermediary 
server as the ICE negotiation proceeds as well (a "low bandwidth" 
signaling channel is required between a pair of peers in many cases for 
exchange of pertinent information before the media connection can be 
established).

B. A single connection object which does most of the magic, but with the 
low-bandwidth signaling made external though events

In this case, the API is the same as above, but rather than providing in 
the destiniationSpecifier some sort of server address for the "low 
bandwidth signaling" the connection object would instead fire events 
containing (essentially opaque to the channel, but in an agreed format) 
the data which needed to be transported to the other side. A 
corresponding API on the far end's "listening" connection object would 
allow for the data to be injected at the far end. The script implementer 
would simply need to wire the event up to an existing transport 
mechanism (HTTP, WebSockets).

C. A connection object which does most of the magic, but with a parent 
object that is the connection to the signaling mechanism

This is the approach Flash Player has taken. The NetConnection is opened 
to the rendezvous-handling server (which can also act as a media relay), 
NetStreams are then opened either to that relaying server or directly to 
other peers. These peer-to-peer NetStreams use the open NetConnection 
for the peer setup signaling (determining the far IP addresses, 
requesting NAT hole-punching, etc.)

D. Multiple connection objects

In this case, the API is similar to case A for the UDP peer-to-peer 
channel but a *different* API and connection object (perhaps an 
extension of the existing WebSocket API) is used for the client-server 
fallback cases. This requires the script to determine when a direct 
connection cannot be made (which sometimes cannot be determined without 
trying it out). Makes the programming model more complex, but avoids 
duplication, as existing client-server models can be used for all 
relayed (and server-based) media cases without there being two (or more) 
different ways to move real-time media from a server to a browser (and 
vice versa).

E. Script-intensive (but flexible) components

In this model, an API exists which exposes a nearly bare UDP socket, but 
with permissions restricted such that traffic may not be sent unless 
handshake probes are sent and received. Maximum flexibility, at some 
significant cost in script complexity (though this could be encapsulated 
into well-known, even open-source, libraries).

An example API for this is would be:
   var connection = new PeerConnection(ICEusernameString, 
ICEpasswordString);
      opens the local UDP port, nothing is permitted at this point
   PeerConnection.testConnectivity(farSockaddrString, farUsernameString, 
farPasswordString);
      sends STUN connectivity checks
   PeerConnection.onConnectivityTest(receivedSockaddrString, usenameString);
      event that fires when a connectivity test is received with valid 
credentials. Browser also automatically generates the transaction response.
   PeerConnection.onConnectivitySuccess(receivedSockaddrString, 
reflexiveAddrString, usernameString);
      event that fires when the connectivity test transaction response 
arrives... Browser then adds this destination to a whitelist of 
addresses that media streams may be sent to
   PeerConnection.openMediaSession(sockaddrString)
      fails immediately if the address isn't on the whitelist, otherwise 
starts a DTLS handshake with the specified address

----

Hopefully this will stimulate discussion and/or additional ideas from 
other folks who are working on designing and/or implementing solutions 
to this problem.

Matthew Kaufman
matthew.kaufman at skype.net
Skype: atmatthewat