Document: draft-ietf-avt-rtp-vorbis-06
Reviewer: Spencer Dawkins
Review Date: 26 Oct 2007 (sorry, late!)
IETF LC End Date:   22 Oct 2007
IESG Telechat date: 01 Nov 2007

Summary: This document is close to ready for publication as a Proposed Standard. I have a small number of questions, mostly involving clarity or
2119 language. Please take a look at the question in 7.1, especially.

Comments: I have included "nits" in this review for the convenience of authors and editors later in the process. Nits are not part of a Gen-ART review.

1.  Introduction

   Vorbis is a general purpose perceptual audio codec intended to allow
   maximum encoder flexibility, thus allowing it to scale competitively
   over an exceptionally wide range of bitrates.  At the high quality/
   bitrate end of the scale (CD or DAT rate stereo, 16/24 bits), it is
   in the same league as AAC.  Vorbis is also intended for lower and

Spencer (nit): AAC? has not been expanded previously.

   higher sample rates (from 8kHz telephony to 192kHz digital masters)
   and a range of channel representations (monaural, polyphonic, stereo,
   quadraphonic, 5.1, ambisonic, or up to 255 discrete channels).

2.2.  Payload Header

   Vorbis Data Type (VDT): 2 bits

   This field specifies the kind of Vorbis data stored in this RTP
   packet.  There are currently three different types of Vorbis
   payloads.  Each packet MUST contain only a single type of Vorbis
   payload (e.g. you MUST not aggregate configuration and comment

Spencer: this is close to a nit, but 2119 language is important. This is just restating the previous MUST. I'd suggest either "must" in lower case or "MUST NOT" - if there's a reason to have two 2119 requirements that say the same thing.

   payload in the same packet)

      0 = Raw Vorbis payload
      1 = Vorbis Packed Configuration payload
      2 = Legacy Vorbis Comment payload
      3 = Reserved

   The packets with a VDT of value 3 MUST be ignored

   The last 4 bits represent the number of complete packets in this
   payload.  This provides for a maximum number of 15 Vorbis packets in
   the payload.  If the packet contains fragmented data the number of
   packets MUST be set to 0.

Spencer (nit): what type of fragmentation is this? Please provide an adjective :-)

2.3.  Payload Data

   The Vorbis packet length header is the length of the Vorbis data
   block only and does not count the length field.

Spencer (nit): s/count/include/, I think.

   The payload packing of the Vorbis data packets MUST follow the
   guidelines set-out in [3] where the oldest packet occurs immediately

Spencer: again, adjectives are good. This is saying "the oldest Vorbis packet", right? It would be better if the specification doesn't use language like "the oldest packet in the packet" with no adjectives - that doesn't take us to a good place.

   after the RTP packet header.  Subsequent packets, if any, MUST follow
   in temporal order.

Spencer: "Subsequent Vorbis packets", right? what does the receiver do if the "follow in temporal order" MUST is violated?

3.1.1.  Packed Configuration

   A Vorbis Packed Configuration is indicated with the Vorbis Data Type
   field set to 1.  Of the three headers defined in the Vorbis I
   specification [10], the identification and the setup MUST be packed
   as they are, while the comment header MAY be replaced with a dummy
   one.  The packed configuration follows a generic way to store xiph
   codec configurations: The first field stores the number of the
   following packets minus one (count field), the next ones represent
   the size of the headers (length fields), the headers immediately
   follow the list of length fields.  The size of the last header is
   implicit.  The count and the length fields are encoded using the
   following logic: the data is in network order, every byte has the

Spencer (nit): c/network order/network byte order/, and there are multiple occurrences in the document...

   most significant bit used as flag and the following 7 used to store
   the value.  The first N bit are to be taken, where N is number of
   bits representing the value modulo 7, and stored in the first byte.
   If there are more bits, the flag bit is set to 1 and the subsequent
   7bit are stored in the following byte, if there are remaining bits
   set the flag to 1 and the same procedure is repeated.  The ending
   byte has the flag bit set to 0.  In order to decode it is enough to
   iterate over the bytes until the flag bit set to 0, for every byte
   the data is added to the accumulated value multiplied by 128.  The
   headers are packed in the same order they are present in ogg:
   identification, comment, setup.

3.2.  Out of Band Transmission

   This section, as stated above, does not cover all the possible out-
   of-band delivery methods since they rely on different protocols and
   are linked to specific applications.  The following packet definition
   SHOULD be used in out-of-band delivery and MUST be used when

Spencer: is there an obvious reason to violate the SHOULD?

   Configuration is inlined in the SDP.

5.1.  Example Fragmented Vorbis Packet

   The Fragment type field is set to 2 and the number of packets field
   is set to 0.  For large Vorbis fragments there can be several of
   these type of payload packets.  The maximum packet size SHOULD be no

Spencer (nit): s/these type/this type/?

Spencer: why is this a SHOULD?

   greater than the path MTU, including all RTP and payload headers.
   The sequence number has been incremented by one but the timestamp
   field remains the same as the initial packet.

5.2.  Packet Loss

   As there is no error correction within the Vorbis stream, packet loss
   will result in a loss of signal.  Packet loss is more of an issue for
   fragmented Vorbis packets as the client will have to cope with the
   handling of the Fragment Type.  In case of loss of fragments the
   client MUST discard all the remaining fragments and decode the

Spencer (nit) - "remaining Vorbis fragments" and "incomplete Vorbis packet"?

   incomplete packet.  If we use the fragmented Vorbis packet example
   above and the first packet is lost the client MUST detect that the

Spencer (nit) - "and the first RTP packet is lost"?

   next packet has the packet count field set to 0 and the Fragment type
   2 and MUST drop it.  The next packet, which is the final fragmented
   packet, MUST be dropped in the same manner.  If the missing packet is
   the last, the received two fragments will be kept and the incomplete
   vorbis packet decoded.

6.  IANA Considerations

      configuration-uri:  the URI [4] of the configuration headers in
         case of out of band transmission.  In the form of
         "protocol://path/to/resource/", depending on the specific

Spencer: isn't this "scheme://path/to/resource"?

         method, a single configuration packet could be retrived by its
         Ident number, or multiple packets could be aggregated in a
         single stream.  Non hierarchical protocols MAY point to a
         resource using their specific syntax.

7.1.  Mapping Media Type Parameters into SDP

   The information carried in the Media Type media type specification
   has a specific mapping to fields in the Session Description Protocol
   (SDP) [5], which is commonly used to describe RTP sessions.  When SDP
   is used to specify sessions the mapping are as follows:

Spencer: is there anything you can say about Receiver behavior when the Media Type and SDP don't map correctly?

7.1.1.  SDP Example

   The following example shows a basic SDP single stream.  The first
   configuration packet is inlined in the sdp, other configurations

Spencer (nit): it would be great if SDP, URI etc. were consistently capitalized.

   could be fetched at any time from the first provided uri using or all

Spencer: is this "the URI currently in use"? but the sentence doesn't parse as written.

   the known configuration could be downloaded using the second uri.
   The inline base64 [9] configuration string is trimmed because of the

Spencer: is this "is folded in this example due to RFC line length limitations"?

   length.
      c=IN IP4 192.0.2.1
      m=audio RTP/AVP 98
      a=rtpmap:98 vorbis/44100/2
      a=fmtp:98 delivery-method=inline;
      configuration=AAAAAZ2f4g9NAh4aAXZvcmJpcwA...; delivery-
      method=out_band; configuration-uri=rtsp://path/to/the/resource;
      delivery-method=out_band;
      configuration-uri=http://another/path/to/resource/;

   Note that the payload format (encoding) names are commonly shown in
   upper case.  Media Type subtypes are commonly shown in lower case.
   These names are case-insensitive in both places.  Similarly,
   parameter names are case-insensitive both in Media Type types and in
   the default mapping to the SDP a=fmtp attribute.  The exception
   regarding case sensitivity is the configuration-uri URI which MUST be
   regarded as being case sensitive.  The a=fmtp line is a single line

Spencer: "shown as multiple lines in this document for clarity"?

   even if it is presented broken because of clarity.

7.2.  Usage with the SDP Offer/Answer Model

   The only paramenter negotiable is the delivery method.  All the

Spencer (nit): c/paramenter negotiable/negotiable parameter/

   others are declarative: the offer, as described in An Offer/Answer
   Model Session Description Protocol [8], may contain a large number of
   delivery methods per single fmtp attribute, the answerer MUST remove
   every delivery-method and configuration-uri not supported.  All the
   parameters MUST not be altered on answer otherwise.

8.  Congestion Control

   Vorbis clients SHOULD send regular receiver reports detailing

Spencer: is there a well-understood definition of "regular" within this community?

   congestion.  A mechanism for dynamically downgrading the stream,
   known as bitrate peeling, will allow for a graceful backing off of
   the stream bitrate.  This feature is not available at present so an
   alternative would be to redirect the client to a lower bitrate stream
   if one is available.

9.1.  Stream Radio

   This is one of the most common situation: one single server streaming
   content in multicast, the clients may start a session at random time.
   The content itself could be a mix of live stream, as the wj's voice,

Spencer (nit): "wj's"? please spell this out (I'm guessing what this means)

   and stored streams as the music she plays.