Document: draft-ietf-rohc-sigcomp-impl-guide-09.txt Reviewer: Spencer Dawkins Review Date: 2006-12-18 IETF LC End Date: 2006-12-21 Summary: This document is almost ready for publication as a Proposed Standard, although I do have questions and comments for the authors. Comments: The questions and comments in Section 2.2, 3.1, 3.2, 3.4, 4, 4.1 should be addressed as part of Last Call comment resolution. Other items, marked as "Nit -", may be fixed in any editing pass. Abstract This document describes common misinterpretations and some ambiguities in the Signalling Compression Protocol (SigComp), and offers guidance to developers to clarify any resultant problems. Spencer: Nit - "clarify any resultant problems" seems strange - "clarify any resultant confusion", or "resolve any resultant problems"? SigComp defines a scheme for compressing messages generated by application protocols such as the Session Initiation Protocol (SIP). This document (if approved) updates: RFC 3320, RFC 3321, RFC 3485. Spencer: Nit - IETF is not entirely consistent about whether implementation guides simply clarify, or if they correct as well. It might be helpful to say "clarifies and corrects text in the following updated RFCs". 2.1 Bytecode within Decompression Memory Size SigComp [1] states that the default Decompression Memory Size (DMS) is 2K. The UDVM memory size is defined in section RFC3320:7 to be (DMS - sizeof (sigcomp_msg)) for messages transported over UDP and (DMS / 2) for those transported over TCP. This means that when the message contains the bytecode (as it will for at least the first message) there will actually be two copies of the bytecode within the decompressor memory (see Figure 1). It is correct that there are two copies of the bytecode within the decompressor memory, in this case. Spencer: Nit - the last two sentences were difficult to read because so much of the text is duplicated. Perhaps "The presence of the second bytecode copy is not an error in this case"? |<----------------------------DMS--------------------------------->| |<-----sigcomp message---->|<------------UDVM memory size--------->| +-+----------+-------------+-----+----------+----------------------+ | | bytecode | comp msg | | bytecode | circular buffer | +-+----------+-------------+-----+----------+----------------------+ ^ ^ | | Sigcomp header Low bytes of UDVM Figure 1: Bytecode and UDVM memory size within DMS 2.2 Default Decompression Memory Size For many implementations, the length of decompression bytecode sent is in the range of three to four hundred bytes. Because SigComp specifies a default DMS of 2K, the described scheme seriously restricts the size of the circular buffer, and of the compressed message itself. In some cases, this set of circumstances has a damaging effect on the compression ratio; for others, it makes it completely impossible to send certain messages compressed. To address this problem, those mandating the use of Sigcomp need to also provide further specification for their application that mandates the use of an appropriately sized DMS. Spencer: I'm concerned that section 2.2 provides a heuristic to identify an unreasonable length, but not a reasonable one. If someone didn't do the analysis, would a 1K length be an improvement? a 2K length? If you can provide any guidance, that would likely be helpful. 3. UDVM Instructions 3.1 Data Input Instructions When inputting data from the compressed message, the INPUT-BYTES (section RFC3320-9.4.2) and INPUT-BITS (section RFC3302-9.4.3) instructions both have the paragraph: "If the instruction requests data that lies beyond the end of the SigComp message, no data is returned. Instead the UDVM moves program execution to the address specified by the address operand." The intent is that if n bytes/bits are requested but only m are left in the message (where m < n) then no bytes/bits MUST be returned to the UDVM and the m bytes/bits that are there MUST remain in the message unchanged. Spencer: the first MUST doesn't seem clear - perhaps "the decompressor dispatcher MUST NOT return any bytes/bits to the UDVM, and the m bytes/bits available MUST remain in the message unchanged"? But "no bytes MUST be returned" isn't "MUST NOT return any BYTES, IIUC. For example, if the remaining bytes of a message are: 0x01 0x02 0x03 and the UDVM encounters an INPUT-BYTES (6, a, b) instruction. The decompressor dispatcher returns no bytes and jumps to the instruction specified by b. This contains an INPUT-BYTES (2, c, d) instruction so the decompressor dispatcher successfully returns the bytes 0x01 and 0x02. In the case where an INPUT-BYTES instruction follows an INPUT-BITS instruction that has left a partial byte in the message, the partial byte should still be thrown away even if there are not enough bytes to input. INPUT-BYTES (0, a, b) can be used to flush out a partial byte. 3.2 MULTILOAD Spencer: The description of MULTILOAD isn't completely clear to me (I THINK I understand what is being described, but "not allowed to be self modifying" is ambiguous). It would be clearer to me, I think, if the first sentence was "Any indirection of MULTILOAD parameters must be done at execution time, in order to make step-by-step implementation simpler", and the "That is" sentence at the end of the paragraph was deleted. Is this what the paragraph is trying to say? The MULTILOAD instruction is explicitly not allowed to be self modifying in order to make step-by-step implementation simpler. Therefore, any other implementation technique (e.g. decode all operands then execute, which is the model of all other instructions) MUST yield the same result as a step-by-step implementation would. That is, if there is any indirection of parameters, the indirection MUST be done at execution time. For example: 3.4 Using the stack The instructions PUSH, POP, CALL and RETURN make use of a stack which is set up using the well known memory address stack_location to define where in memory the stack is located. Use of the stack is defined in section RFC3320-8.3 which states that: '"Pushing" a value on the stack is an abbreviation for copying the value to stack[stack_fill] and then increasing stack_fill by 1.' and 'stack_fill is an abbreviation for the 2-byte word at stack_location and stack_location + 1'. In the very rare case that the value of stack_fill is 0xFFFF, the new value pushed onto the stack will be Spencer: this text describes a problem and almost obscures the solution. Suggest "In the very rare case that the value of stack_fill is 0xFFFF, the original stack_fill value should be increased by 1 to 0x0000 and written back to stack_location + 1 (which will overwrite the value that has been pushed onto the stack". Spencer: I'm confused about why some of this text is "MUST (NOT)", but "increased by 1 to 0x0000" is only a "should" - and isn't 2119 language. written to stack [0xFFFF] = stack_location. Stack_fill would then be increased by 1, however, the value at stack_location and stack_location + 1 has just been updated. To maintain the integrity of the stack with regard to over and underflow, stack_fill MUST NOT be re-read. The original stack_fill value of 0xFFFF should be increased by 1 to 0x0000 and written back to stack_location and stack_location + 1 (which will overwrite the value that has been pushed onto the stack). 4. Byte Copying Rules Section RFC3320-8.4 states that "The string of bytes is copied in ascending order of memory address, respecting the bounds set by Spencer: the following text also seems misleading :-) or at least confusing. Is it saying that attempts to copy bytes outside the bounds will still result in circular buffer references within the UDVM memory, so they are OK, or something else? I'm not seeing a clear statement of what happens when I copy to the left of byte_copy_left, etc. byte_copy_left and byte_copy_right." This is misleading in that it is perfectly legitimate to copy bytes outside of the bounds set by byte_copy_left and byte_copy_right. Byte_copy_left and byte_copy_right provide the ability to maintain a circular buffer as follows: For moving to the right if current_byte == ((byte_copy_right - 1) mod 2 ^ 16): next_byte = byte_copy_left else: next_byte = (current_byte + 1) mod 2 ^ 16 which is equivalent to the algorithm given in section 8.4. For moving to the left if current_byte == byte_copy_left: previous_byte = (byte_copy_right - 1) mod 2 ^ 16 else: previous_byte = (current_byte - 1) mod 2 ^ 16 Moving to the left is only used for COPY_OFFSET. Consequently, copying could begin to the left of byte_copy_left and continue across it (and jump back to it according to the given algorithm if necessary) and could begin at or to the right of byte_copy_right (though care must be taken to prevent decompression failure due to writing to / reading from beyond the UDVM memory). 4.1 Instructions That Use Byte Copying Rules Section RFC3320-8.4 specifies the byte copying rules and includes a list of the instructions that obey them. STATE-CREATE is not in this list but END-MESSAGE is. This caused confusion due to the fact that neither instruction actually does any byte copying, rather both instructions give information to the state-handler to create state. Logically both instructions should have the same information about byte copying. Spencer: so is this text saying that STATE-CREATE should be in the list? Or something else? When state is created by the state-handler (whether the instruction was from END-MESSAGE or STATE-CREATE), the byte copying rules of section RFC3320-8.4 apply. Note that, if the contents of the UDVM changes between the occurrence of the STATE-CREATE instruction and the state being created, the bytes that are stored are those in the buffer at the time of creation (i.e. when the message has been decompressed and authenticated). CRC is not mentioned in section RFC3320-8.4 in the list of instructions that obey byte copying rules, but its description in section RFC3320-9.3.5 states that these rules are to be obeyed. When reading data over which to perform the crc check, byte copying rules apply as specified in section RFC3320-8.4. Spencer: again - is this text saying that CRC should be in the list? When the partial identifier for a STATE-FREE instruction is read, (during the execution of END-MESSAGE) byte copying rules as per section RFC3320-8.4 apply. 12. RFC 3485 SIP/SDP Static Dictionary However, there are some minor bugs in the dictionary. In a number of places, the entry in the offset table refers to an address that is not in the corresponding priority section in the list of strings. Consequently, if the bytecode uses the offset table and limits use of the dictionary to priorities less than 4, then care must be taken not to use the following strings in the dictionary: 'application' at 0x0334 is not at priority 2 (it's priority 4) 'sdp' at 0x064b is not at priority 2 (it's priority 4) 'send' at 0x089d is not at priority 2 (it's priority 3) 'recv' at 0x0553 is not at priority 2 (it's priority 4) 'phone' at 0x00f2 is not at priority 3 (it's priority 4) These are seen to be relatively low cost bugs as only these 5 strings are affected and they are only affected under certain conditions. Spencer: Nit - I can't make myself object to documenting this caution instead of updating the dictionary specification (here or elsewhere), but I wish I could... :-( Is the assumption that it's not worth the disruption of bumping the version number? 15. Acknowledgements Spencer: Nit - the first sentence here isn't clear about who is being foolish (and it's a nice comment that needs to be clear in order to be clever). Largely through being foolish enough to be authors of, or to have implemented, SigComp we would like to thank the following: Richard Price Lajos Zaccomer Timo Forsman Tor-Erik Malen Jan Christoffersson Kwang Mien Chan William Kembery Pekka Pessi for their confusion, suggestions and comments. Appendix A. Dummy Application Protocol (DAP) A.1 Introduction This appendix defines a simple dummy application protocol (DAP) that can be used for SigComp interoperability testing. This is handy for SigComp implementations that are not integrated with SIP stack. It also provides some features that facilitate the tests of SigComp internal operations. Spencer: thank you guys for including this. I wish all our protocol specs were this well-defined ;-}