Document: draft-ietf-rserpool-asap-19
Reviewer: Spencer Dawkins
Review Date: 2008-04-16
IETF LC End Date: 2008-04-14 (sorry!)
IESG Telechat date:

Summary: This document is close to ready for publication as an Experimental RFC. I have some specific comments below, but nothing that's a show-stopper.

If this document were to advance onto the standards track, I'd recommend a very tight editing pass on 2119 language, especially looking for anything that seems to be capitalized for emphasis, and I'd recommend clearer statements for why some of the SHOULDs aren't MUSTs. I don't see any reason to hold up for this when publishing as Experimental.

Comments:

1.1.  Definitions

   Home ENRP server:  The ENRP server to which a PE or PU currently
      sends all namespace service requests.  A PE MUST only have one

Spencer (clarity): I'm not wild about 2119 language in terminology sections, but at the very least, this section comes before you describe the 2119 conventions in Section 1.4...

      home ENRP server at any given time and both the PE and its home
      ENRP server MUST know and keep track of this relationship.  A PU
      SHOULD select one of the available ENRP servers as its Home ENRP
      server but the collective ENRP servers may change this by the
      sending or a ASAP_ENDPOINT_KEEP_ALIVE message.

1.2.  Organization of this document

   Section 2 details the ASAP message formats.  In Section 3 we provide
   detailed ASAP procedures for for the ASAP implementer.  In Section 6

Spencer (clarity): interesting jump from Sec 3 to Sec 6... ;-)

   we give details of the ASAP interface, focusing on the communication
   primitives between ASAP the applications above ASAP and ASAP itself,
   and the communications primitives between ASAP and SCTP (or other
   transport layers).  Also included in this discussion are relevant
   timers and configurable parameters as appropriate.  Section 7
   provides threshold and protocol variables.

2.2.  ASAP Messages

   0x0d       - ASAP_BUSINESS_CARD

Spencer (clarity): it would be nice to see "business card" called out in the terminology section, at a minimum.

3.5.  Unreachable endpoints

   Optionally, an ENRP server may also periodically send point-to-point
   ASAP_ENDPOINT_KEEP_ALIVE (with 'H' flag set to '0') messages to each
   of the PEs owned by the ENRP server in order to check their
   reachability status.  If the send of ASAP_ENDPOINT_KEEP_ALIVE to a PE
   fails, the ENRP server MUST consider the PE as unreachable and MUST
   remove the PE from its handlespace .  Note, if an ENRP server owns a
   large number of PEs, the implementation should pay attention not to
   flood the network with bursts of ASAP_ENDPOINT_KEEP_ALIVE messages.
   Instead, the implementation MUST distribute the
   ASAP_ENDPOINT_KEEP_ALIVE message traffic over a time period.

Spencer (comment): Is this a requirement for application-level behavior beyond what TCP or SCTP would do at the transport level? If so ... I'd expect more guidance here (if everyone knew how to "pay attention not to flood the network with bursts of messages", we'd all be using UDP).

3.7.1.  SCTP Send Failure

   In such a case, the ASAP endpoint should not re-send the
   undeliverable message.  Instead, it should discard the message and
   start the ENRP server hunt procedure as described in Section 3.6 .

Spencer (comment): I'm not sure why these "should"s are non-normative, and I'm not sure why the first "should" is not MUST.

   After finding a new Home ENRP server, the ASAP endpoint should
   reconstruct and retransmit the request.

   Note that an ASAP endpoint MAY also choose to NOT discard the
   message, but to queue it for retransmission after a new Home ENRP
   server is found.  If an ASAP endpoint does choose to discard the
   message, after a new Home ENRP server is found, the ASAP endpoint
   MUST be capable of reconstructing the original request.

Spencer (comment): this seems way deep into implementation, not into protocol interoperation (whether I discard a message and reconstruct it, or queue it for retransmission, would be up to the implementer, or do I misunderstand?).

3.8.  Cookie handling procedures

   Note: a control channel is a communication channel between a PU and
   PE that does not end in data passed to the user.  This is

Spencer (clarity): s/does not end in/does not carry/ ?

   accomplished with SCTP by using a PPID to separate the ASAP messages
   (Cookie and Business Card) from normal data messages.

6.5.2.1.  Round Robin Policy

   When an ASAP endpoint sends messages by Pool Handle and Round-Robin
   is the current policy of that Pool, the ASAP endpoint of the sender
   will select the receiver for each outbound message by round-Robining
   through all the registered PEs in that Pool, in an attempt to achieve
   an even distribution of outbound messages.  Note that in a large
   server pool, the ENRP server MAY not send back all PEs to the ASAP

Spencer (comment): is this supposed to be a 2119 MAY? Or is it more like "might not"?

   client.  In this case the client or PU will be performing a round
   robin policy on a subset of the entire Pool.

6.5.5.  Message Delivery Options

      Note that this is a best-effort service.  Applications should be
      aware that messages can be lost during the failover process, even
      if the underlying transport supports retrieval of unacknowledged
      data (e.g.  SCTP) (Example: messages acknowledged by the SCTP
      layer at a PE, but not yet read by the PE when a PE failure
      occurs.)  In the case where the underlying transport does not
      support such retrieval (e.g.  TCP), any data already submitted by
      ASAP to the transport layer MAY be lost upon failover.