Document: draft-ietf-rohc-sigcomp-impl-guide-09.txt
Reviewer: Spencer Dawkins
Review Date:  2006-12-18
IETF LC End Date: 2006-12-21

Summary: This document is almost ready for publication as a Proposed
Standard, although I do have questions and comments for the authors.

Comments: The questions and comments in Section 2.2, 3.1, 3.2, 3.4, 4, 4.1
should be addressed as part of Last Call comment resolution. Other items,
marked as "Nit -", may be fixed in any editing pass.

Abstract

   This document describes common misinterpretations and some
   ambiguities in the Signalling Compression Protocol (SigComp), and
   offers guidance to developers to clarify any resultant problems.

Spencer: Nit - "clarify any resultant problems" seems strange - "clarify any
resultant confusion", or "resolve any resultant problems"?

   SigComp defines a scheme for compressing messages generated by
   application protocols such as the Session Initiation Protocol (SIP).
   This document (if approved) updates: RFC 3320, RFC 3321, RFC 3485.

Spencer: Nit - IETF is not entirely consistent about whether implementation
guides simply clarify, or if they correct as well. It might be helpful to
say "clarifies and corrects text in the following updated RFCs".

2.1  Bytecode within Decompression Memory Size

   SigComp [1] states that the default Decompression Memory Size (DMS)
   is 2K. The UDVM memory size is defined in section RFC3320:7 to be
   (DMS - sizeof (sigcomp_msg)) for messages transported over UDP and
   (DMS / 2) for those transported over TCP.  This means that when the
   message contains the bytecode (as it will for at least the first
   message) there will actually be two copies of the bytecode within the
   decompressor memory (see Figure 1).  It is correct that there are two
   copies of the bytecode within the decompressor memory, in this case.

Spencer: Nit - the last two sentences were difficult to read because so much
of the text is duplicated. Perhaps "The presence of the second bytecode copy 
is not an error in this case"?

    |<----------------------------DMS--------------------------------->|
    |<-----sigcomp message---->|<------------UDVM memory size--------->|
    +-+----------+-------------+-----+----------+----------------------+
    | | bytecode |  comp msg   |     | bytecode | circular buffer      |
    +-+----------+-------------+-----+----------+----------------------+
     ^                            ^
     |                            |
    Sigcomp header          Low bytes of UDVM


            Figure 1: Bytecode and UDVM memory size within DMS


2.2  Default Decompression Memory Size

   For many implementations, the length of decompression bytecode sent
   is in the range of three to four hundred bytes.  Because SigComp
   specifies a default DMS of 2K, the described scheme seriously
   restricts the size of the circular buffer, and of the compressed
   message itself.  In some cases, this set of circumstances has a
   damaging effect on the compression ratio; for others, it makes it
   completely impossible to send certain messages compressed.

   To address this problem, those mandating the use of Sigcomp need to
   also provide further specification for their application that
   mandates the use of an appropriately sized DMS.

Spencer: I'm concerned that section 2.2 provides a heuristic to identify an
unreasonable length, but not a reasonable one. If someone didn't do the
analysis, would a 1K length be an improvement? a 2K length? If you can
provide any guidance, that would likely be helpful.

3.  UDVM Instructions

3.1  Data Input Instructions

   When inputting data from the compressed message, the INPUT-BYTES
   (section RFC3320-9.4.2) and INPUT-BITS (section RFC3302-9.4.3)
   instructions both have the paragraph:

   "If the instruction requests data that lies beyond the end of the
   SigComp message, no data is returned.  Instead the UDVM moves program
   execution to the address specified by the address operand."

   The intent is that if n bytes/bits are requested but only m are left
   in the message (where m < n) then no bytes/bits MUST be returned to
   the UDVM and the m bytes/bits that are there MUST remain in the
   message unchanged.

Spencer: the first MUST doesn't seem clear - perhaps "the decompressor
dispatcher MUST NOT return any bytes/bits to the UDVM, and the m bytes/bits
available MUST remain in the message unchanged"? But "no bytes MUST be
returned" isn't "MUST NOT return any BYTES, IIUC.

   For example, if the remaining bytes of a message are: 0x01 0x02 0x03
   and the UDVM encounters an INPUT-BYTES (6, a, b) instruction.  The
   decompressor dispatcher returns no bytes and jumps to the instruction
   specified by b.  This contains an INPUT-BYTES (2, c, d) instruction
   so the decompressor dispatcher successfully returns the bytes 0x01
   and 0x02.

   In the case where an INPUT-BYTES instruction follows an INPUT-BITS
   instruction that has left a partial byte in the message, the partial
   byte should still be thrown away even if there are not enough bytes
   to input.

   INPUT-BYTES (0, a, b) can be used to flush out a partial byte.

3.2  MULTILOAD

Spencer: The description of MULTILOAD isn't completely clear to me (I THINK
I understand what is being described, but "not allowed to be self modifying"
is ambiguous). It would be clearer to me, I think, if the first sentence was
"Any indirection of MULTILOAD parameters must be done at execution time, in
order to make step-by-step implementation simpler", and the "That is"
sentence at the end of the paragraph was deleted. Is this what the paragraph
is trying to say?

   The MULTILOAD instruction is explicitly not allowed to be self
   modifying in order to make step-by-step implementation simpler.
   Therefore, any other implementation technique (e.g. decode all
   operands then execute, which is the model of all other instructions)
   MUST yield the same result as a step-by-step implementation would.
   That is, if there is any indirection of parameters, the indirection
   MUST be done at execution time.  For example:

3.4  Using the stack

   The instructions PUSH, POP, CALL and RETURN make use of a stack which
   is set up using the well known memory address stack_location to
   define where in memory the stack is located.  Use of the stack is
   defined in section RFC3320-8.3 which states that: '"Pushing" a value
   on the stack is an abbreviation for copying the value to
   stack[stack_fill] and then increasing stack_fill by 1.' and
   'stack_fill is an abbreviation for the 2-byte word at stack_location
   and stack_location + 1'.  In the very rare case that the value of
   stack_fill is 0xFFFF, the new value pushed onto the stack will be

Spencer: this text describes a problem and almost obscures the solution.
Suggest "In the very rare case that the value of stack_fill is 0xFFFF, the
original stack_fill value should be increased by 1 to 0x0000 and written
back to stack_location + 1 (which will overwrite the value that has been
pushed onto the stack".

Spencer: I'm confused about why some of this text is "MUST (NOT)", but
"increased by 1 to 0x0000" is only a "should" - and isn't 2119 language.

   written to stack [0xFFFF] = stack_location.  Stack_fill would then be
   increased by 1, however, the value at stack_location and
   stack_location + 1 has just been updated.  To maintain the integrity
   of the stack with regard to over and underflow, stack_fill MUST NOT
   be re-read.  The original stack_fill value of 0xFFFF should be
   increased by 1 to 0x0000 and written back to stack_location and
   stack_location + 1 (which will overwrite the value that has been
   pushed onto the stack).

4.  Byte Copying Rules

   Section RFC3320-8.4 states that "The string of bytes is copied in
   ascending order of memory address, respecting the bounds set by

Spencer: the following text also seems misleading :-) or at least confusing.
Is it saying that  attempts to copy bytes outside the bounds will still
result in circular buffer references within the UDVM memory, so they are OK,
or something else? I'm not seeing a clear statement of what happens when I
copy to the left of byte_copy_left, etc.

   byte_copy_left and byte_copy_right."  This is misleading in that it
   is perfectly legitimate to copy bytes outside of the bounds set by
   byte_copy_left and byte_copy_right.  Byte_copy_left and
   byte_copy_right provide the ability to maintain a circular buffer as
   follows:

   For moving to the right

   if current_byte == ((byte_copy_right - 1) mod 2 ^ 16):
       next_byte = byte_copy_left
   else:
       next_byte = (current_byte + 1) mod 2 ^ 16

   which is equivalent to the algorithm given in section 8.4.

   For moving to the left

   if current_byte == byte_copy_left:
       previous_byte = (byte_copy_right - 1) mod 2 ^ 16
   else:
       previous_byte = (current_byte - 1) mod 2 ^ 16

   Moving to the left is only used for COPY_OFFSET.

   Consequently, copying could begin to the left of byte_copy_left and
   continue across it (and jump back to it according to the given
   algorithm if necessary) and could begin at or to the right of
   byte_copy_right (though care must be taken to prevent decompression
   failure due to writing to / reading from beyond the UDVM memory).

4.1  Instructions That Use Byte Copying Rules

   Section RFC3320-8.4 specifies the byte copying rules and includes a
   list of the instructions that obey them.  STATE-CREATE is not in this
   list but END-MESSAGE is.  This caused confusion due to the fact that
   neither instruction actually does any byte copying, rather both
   instructions give information to the state-handler to create state.
   Logically both instructions should have the same information about
   byte copying.

Spencer: so is this text saying that STATE-CREATE should be in the list? Or
something else?

   When state is created by the state-handler (whether the instruction
   was from END-MESSAGE or STATE-CREATE), the byte copying rules of
   section RFC3320-8.4 apply.

   Note that, if the contents of the UDVM changes between the occurrence
   of the STATE-CREATE instruction and the state being created, the
   bytes that are stored are those in the buffer at the time of creation
   (i.e. when the message has been decompressed and authenticated).

   CRC is not mentioned in section RFC3320-8.4 in the list of
   instructions that obey byte copying rules, but its description in
   section RFC3320-9.3.5 states that these rules are to be obeyed.  When
   reading data over which to perform the crc check, byte copying rules
   apply as specified in section RFC3320-8.4.

Spencer: again - is this text saying that CRC should be in the list?

   When the partial identifier for a STATE-FREE instruction is read,
   (during the execution of END-MESSAGE) byte copying rules as per
   section RFC3320-8.4 apply.

12.  RFC 3485 SIP/SDP Static Dictionary

   However, there are some minor bugs in the dictionary.  In a number of
   places, the entry in the offset table refers to an address that is
   not in the corresponding priority section in the list of strings.
   Consequently, if the bytecode uses the offset table and limits use of
   the dictionary to priorities less than 4, then care must be taken not
   to use the following strings in the dictionary:

      'application' at 0x0334 is not at priority 2 (it's priority 4)
      'sdp' at 0x064b is not at priority 2 (it's priority 4)
      'send' at 0x089d is not at priority 2 (it's priority 3)
      'recv' at 0x0553 is not at priority 2 (it's priority 4)
      'phone' at 0x00f2 is not at priority 3 (it's priority 4)

   These are seen to be relatively low cost bugs as only these 5 strings
   are affected and they are only affected under certain conditions.

Spencer: Nit - I can't make myself object to documenting this caution
instead of updating the dictionary specification (here or elsewhere), but I
wish I could... :-( Is the assumption that it's not worth the disruption of
bumping the version number?

15.  Acknowledgements

Spencer: Nit - the first sentence here isn't clear about who is being
foolish (and it's a nice comment that needs to be clear in order to be
clever).

   Largely through being foolish enough to be authors of, or to have
   implemented, SigComp we would like to thank the following:
      Richard Price
      Lajos Zaccomer
      Timo Forsman
      Tor-Erik Malen
      Jan Christoffersson
      Kwang Mien Chan
      William Kembery
      Pekka Pessi
   for their confusion, suggestions and comments.

Appendix A.  Dummy Application Protocol (DAP)

A.1  Introduction

   This appendix defines a simple dummy application protocol (DAP) that
   can be used for SigComp interoperability testing.  This is handy for
   SigComp implementations that are not integrated with SIP stack.  It
   also provides some features that facilitate the tests of SigComp
   internal operations.

Spencer: thank you guys for including this. I wish all our protocol specs
were this well-defined ;-}