Document: draft-ietf-rserpool-asap-19 Reviewer: Spencer Dawkins Review Date: 2008-04-16 IETF LC End Date: 2008-04-14 (sorry!) IESG Telechat date: Summary: This document is close to ready for publication as an Experimental RFC. I have some specific comments below, but nothing that's a show-stopper. If this document were to advance onto the standards track, I'd recommend a very tight editing pass on 2119 language, especially looking for anything that seems to be capitalized for emphasis, and I'd recommend clearer statements for why some of the SHOULDs aren't MUSTs. I don't see any reason to hold up for this when publishing as Experimental. Comments: 1.1. Definitions Home ENRP server: The ENRP server to which a PE or PU currently sends all namespace service requests. A PE MUST only have one Spencer (clarity): I'm not wild about 2119 language in terminology sections, but at the very least, this section comes before you describe the 2119 conventions in Section 1.4... home ENRP server at any given time and both the PE and its home ENRP server MUST know and keep track of this relationship. A PU SHOULD select one of the available ENRP servers as its Home ENRP server but the collective ENRP servers may change this by the sending or a ASAP_ENDPOINT_KEEP_ALIVE message. 1.2. Organization of this document Section 2 details the ASAP message formats. In Section 3 we provide detailed ASAP procedures for for the ASAP implementer. In Section 6 Spencer (clarity): interesting jump from Sec 3 to Sec 6... ;-) we give details of the ASAP interface, focusing on the communication primitives between ASAP the applications above ASAP and ASAP itself, and the communications primitives between ASAP and SCTP (or other transport layers). Also included in this discussion are relevant timers and configurable parameters as appropriate. Section 7 provides threshold and protocol variables. 2.2. ASAP Messages 0x0d - ASAP_BUSINESS_CARD Spencer (clarity): it would be nice to see "business card" called out in the terminology section, at a minimum. 3.5. Unreachable endpoints Optionally, an ENRP server may also periodically send point-to-point ASAP_ENDPOINT_KEEP_ALIVE (with 'H' flag set to '0') messages to each of the PEs owned by the ENRP server in order to check their reachability status. If the send of ASAP_ENDPOINT_KEEP_ALIVE to a PE fails, the ENRP server MUST consider the PE as unreachable and MUST remove the PE from its handlespace . Note, if an ENRP server owns a large number of PEs, the implementation should pay attention not to flood the network with bursts of ASAP_ENDPOINT_KEEP_ALIVE messages. Instead, the implementation MUST distribute the ASAP_ENDPOINT_KEEP_ALIVE message traffic over a time period. Spencer (comment): Is this a requirement for application-level behavior beyond what TCP or SCTP would do at the transport level? If so ... I'd expect more guidance here (if everyone knew how to "pay attention not to flood the network with bursts of messages", we'd all be using UDP). 3.7.1. SCTP Send Failure In such a case, the ASAP endpoint should not re-send the undeliverable message. Instead, it should discard the message and start the ENRP server hunt procedure as described in Section 3.6 . Spencer (comment): I'm not sure why these "should"s are non-normative, and I'm not sure why the first "should" is not MUST. After finding a new Home ENRP server, the ASAP endpoint should reconstruct and retransmit the request. Note that an ASAP endpoint MAY also choose to NOT discard the message, but to queue it for retransmission after a new Home ENRP server is found. If an ASAP endpoint does choose to discard the message, after a new Home ENRP server is found, the ASAP endpoint MUST be capable of reconstructing the original request. Spencer (comment): this seems way deep into implementation, not into protocol interoperation (whether I discard a message and reconstruct it, or queue it for retransmission, would be up to the implementer, or do I misunderstand?). 3.8. Cookie handling procedures Note: a control channel is a communication channel between a PU and PE that does not end in data passed to the user. This is Spencer (clarity): s/does not end in/does not carry/ ? accomplished with SCTP by using a PPID to separate the ASAP messages (Cookie and Business Card) from normal data messages. 6.5.2.1. Round Robin Policy When an ASAP endpoint sends messages by Pool Handle and Round-Robin is the current policy of that Pool, the ASAP endpoint of the sender will select the receiver for each outbound message by round-Robining through all the registered PEs in that Pool, in an attempt to achieve an even distribution of outbound messages. Note that in a large server pool, the ENRP server MAY not send back all PEs to the ASAP Spencer (comment): is this supposed to be a 2119 MAY? Or is it more like "might not"? client. In this case the client or PU will be performing a round robin policy on a subset of the entire Pool. 6.5.5. Message Delivery Options Note that this is a best-effort service. Applications should be aware that messages can be lost during the failover process, even if the underlying transport supports retrieval of unacknowledged data (e.g. SCTP) (Example: messages acknowledged by the SCTP layer at a PE, but not yet read by the PE when a PE failure occurs.) In the case where the underlying transport does not support such retrieval (e.g. TCP), any data already submitted by ASAP to the transport layer MAY be lost upon failover.