Document: draft-ietf-avt-rtp-vorbis-06 Reviewer: Spencer Dawkins Review Date: 26 Oct 2007 (sorry, late!) IETF LC End Date: 22 Oct 2007 IESG Telechat date: 01 Nov 2007 Summary: This document is close to ready for publication as a Proposed Standard. I have a small number of questions, mostly involving clarity or 2119 language. Please take a look at the question in 7.1, especially. Comments: I have included "nits" in this review for the convenience of authors and editors later in the process. Nits are not part of a Gen-ART review. 1. Introduction Vorbis is a general purpose perceptual audio codec intended to allow maximum encoder flexibility, thus allowing it to scale competitively over an exceptionally wide range of bitrates. At the high quality/ bitrate end of the scale (CD or DAT rate stereo, 16/24 bits), it is in the same league as AAC. Vorbis is also intended for lower and Spencer (nit): AAC? has not been expanded previously. higher sample rates (from 8kHz telephony to 192kHz digital masters) and a range of channel representations (monaural, polyphonic, stereo, quadraphonic, 5.1, ambisonic, or up to 255 discrete channels). 2.2. Payload Header Vorbis Data Type (VDT): 2 bits This field specifies the kind of Vorbis data stored in this RTP packet. There are currently three different types of Vorbis payloads. Each packet MUST contain only a single type of Vorbis payload (e.g. you MUST not aggregate configuration and comment Spencer: this is close to a nit, but 2119 language is important. This is just restating the previous MUST. I'd suggest either "must" in lower case or "MUST NOT" - if there's a reason to have two 2119 requirements that say the same thing. payload in the same packet) 0 = Raw Vorbis payload 1 = Vorbis Packed Configuration payload 2 = Legacy Vorbis Comment payload 3 = Reserved The packets with a VDT of value 3 MUST be ignored The last 4 bits represent the number of complete packets in this payload. This provides for a maximum number of 15 Vorbis packets in the payload. If the packet contains fragmented data the number of packets MUST be set to 0. Spencer (nit): what type of fragmentation is this? Please provide an adjective :-) 2.3. Payload Data The Vorbis packet length header is the length of the Vorbis data block only and does not count the length field. Spencer (nit): s/count/include/, I think. The payload packing of the Vorbis data packets MUST follow the guidelines set-out in [3] where the oldest packet occurs immediately Spencer: again, adjectives are good. This is saying "the oldest Vorbis packet", right? It would be better if the specification doesn't use language like "the oldest packet in the packet" with no adjectives - that doesn't take us to a good place. after the RTP packet header. Subsequent packets, if any, MUST follow in temporal order. Spencer: "Subsequent Vorbis packets", right? what does the receiver do if the "follow in temporal order" MUST is violated? 3.1.1. Packed Configuration A Vorbis Packed Configuration is indicated with the Vorbis Data Type field set to 1. Of the three headers defined in the Vorbis I specification [10], the identification and the setup MUST be packed as they are, while the comment header MAY be replaced with a dummy one. The packed configuration follows a generic way to store xiph codec configurations: The first field stores the number of the following packets minus one (count field), the next ones represent the size of the headers (length fields), the headers immediately follow the list of length fields. The size of the last header is implicit. The count and the length fields are encoded using the following logic: the data is in network order, every byte has the Spencer (nit): c/network order/network byte order/, and there are multiple occurrences in the document... most significant bit used as flag and the following 7 used to store the value. The first N bit are to be taken, where N is number of bits representing the value modulo 7, and stored in the first byte. If there are more bits, the flag bit is set to 1 and the subsequent 7bit are stored in the following byte, if there are remaining bits set the flag to 1 and the same procedure is repeated. The ending byte has the flag bit set to 0. In order to decode it is enough to iterate over the bytes until the flag bit set to 0, for every byte the data is added to the accumulated value multiplied by 128. The headers are packed in the same order they are present in ogg: identification, comment, setup. 3.2. Out of Band Transmission This section, as stated above, does not cover all the possible out- of-band delivery methods since they rely on different protocols and are linked to specific applications. The following packet definition SHOULD be used in out-of-band delivery and MUST be used when Spencer: is there an obvious reason to violate the SHOULD? Configuration is inlined in the SDP. 5.1. Example Fragmented Vorbis Packet The Fragment type field is set to 2 and the number of packets field is set to 0. For large Vorbis fragments there can be several of these type of payload packets. The maximum packet size SHOULD be no Spencer (nit): s/these type/this type/? Spencer: why is this a SHOULD? greater than the path MTU, including all RTP and payload headers. The sequence number has been incremented by one but the timestamp field remains the same as the initial packet. 5.2. Packet Loss As there is no error correction within the Vorbis stream, packet loss will result in a loss of signal. Packet loss is more of an issue for fragmented Vorbis packets as the client will have to cope with the handling of the Fragment Type. In case of loss of fragments the client MUST discard all the remaining fragments and decode the Spencer (nit) - "remaining Vorbis fragments" and "incomplete Vorbis packet"? incomplete packet. If we use the fragmented Vorbis packet example above and the first packet is lost the client MUST detect that the Spencer (nit) - "and the first RTP packet is lost"? next packet has the packet count field set to 0 and the Fragment type 2 and MUST drop it. The next packet, which is the final fragmented packet, MUST be dropped in the same manner. If the missing packet is the last, the received two fragments will be kept and the incomplete vorbis packet decoded. 6. IANA Considerations configuration-uri: the URI [4] of the configuration headers in case of out of band transmission. In the form of "protocol://path/to/resource/", depending on the specific Spencer: isn't this "scheme://path/to/resource"? method, a single configuration packet could be retrived by its Ident number, or multiple packets could be aggregated in a single stream. Non hierarchical protocols MAY point to a resource using their specific syntax. 7.1. Mapping Media Type Parameters into SDP The information carried in the Media Type media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [5], which is commonly used to describe RTP sessions. When SDP is used to specify sessions the mapping are as follows: Spencer: is there anything you can say about Receiver behavior when the Media Type and SDP don't map correctly? 7.1.1. SDP Example The following example shows a basic SDP single stream. The first configuration packet is inlined in the sdp, other configurations Spencer (nit): it would be great if SDP, URI etc. were consistently capitalized. could be fetched at any time from the first provided uri using or all Spencer: is this "the URI currently in use"? but the sentence doesn't parse as written. the known configuration could be downloaded using the second uri. The inline base64 [9] configuration string is trimmed because of the Spencer: is this "is folded in this example due to RFC line length limitations"? length. c=IN IP4 192.0.2.1 m=audio RTP/AVP 98 a=rtpmap:98 vorbis/44100/2 a=fmtp:98 delivery-method=inline; configuration=AAAAAZ2f4g9NAh4aAXZvcmJpcwA...; delivery- method=out_band; configuration-uri=rtsp://path/to/the/resource; delivery-method=out_band; configuration-uri=http://another/path/to/resource/; Note that the payload format (encoding) names are commonly shown in upper case. Media Type subtypes are commonly shown in lower case. These names are case-insensitive in both places. Similarly, parameter names are case-insensitive both in Media Type types and in the default mapping to the SDP a=fmtp attribute. The exception regarding case sensitivity is the configuration-uri URI which MUST be regarded as being case sensitive. The a=fmtp line is a single line Spencer: "shown as multiple lines in this document for clarity"? even if it is presented broken because of clarity. 7.2. Usage with the SDP Offer/Answer Model The only paramenter negotiable is the delivery method. All the Spencer (nit): c/paramenter negotiable/negotiable parameter/ others are declarative: the offer, as described in An Offer/Answer Model Session Description Protocol [8], may contain a large number of delivery methods per single fmtp attribute, the answerer MUST remove every delivery-method and configuration-uri not supported. All the parameters MUST not be altered on answer otherwise. 8. Congestion Control Vorbis clients SHOULD send regular receiver reports detailing Spencer: is there a well-understood definition of "regular" within this community? congestion. A mechanism for dynamically downgrading the stream, known as bitrate peeling, will allow for a graceful backing off of the stream bitrate. This feature is not available at present so an alternative would be to redirect the client to a lower bitrate stream if one is available. 9.1. Stream Radio This is one of the most common situation: one single server streaming content in multicast, the clients may start a session at random time. The content itself could be a mix of live stream, as the wj's voice, Spencer (nit): "wj's"? please spell this out (I'm guessing what this means) and stored streams as the music she plays.