Registration of media type video/H264-RCDO
Tom Kristensen
tom.kristensen at tandberg.com
Tue Oct 6 10:46:59 CEST 2009
Last week we submitted a new version of
draft-ietf-avt-rtp-h264-rcdo:
http://tools.ietf.org/html/draft-ietf-avt-rtp-h264-rcdo-03
In this version of the draft the complete media type registration was included.
The media subtype for RCDO for H.264 is allocated from the IETF tree.
The media type registration is attached to this message, for convenience.
Cheers,
-- Tom
-------------- next part --------------
Type name: video
Subtype name: H264-RCDO
Required parameters:
rate: Indicates the RTP timestamp clock rate. The rate value MUST
be 90000.
Optional parameters:
profile-level-id: A base16 RFC 3548 [9] (hexadecimal) representation
of the following three bytes in the sequence parameter set NAL
unit specified in H.264 [4]: 1) profile_idc, 2) a byte herein
referred to as profile-iop, composed of the values of
constraint_set0_flag, constraint_set1_flag, constraint_set2_flag,
and reserved_zero_5bits in bit-significance order, starting from
the most significant bit, and 3) level_idc.
RCDO is distinct from any profile, this implies that the profile
value 0 (no profile) and the profile_idc byte of the profile-
level-id parameter are equal to 0. An RCDO bitstream MUST obey to
all the constraints of the Baseline profile. Therefore, only
constraint_set0_flag is equal to 1 in the profile-iop part of the
profile-level-id parameter, the remaining bits are set to 0.
If the profile-level-id parameter is used to indicate properties
of a NAL unit stream, it indicates the level that a decoder has to
support in order to comply with H.264 [4] when it decodes the
stream. If the profile-level-id parameter is used for capability
exchange or session setup procedure, it indicates the highest
level supported for the signaled profile.
For example, if a codec supports level 2.1, the profile-level-id
becomes 00800d, in which 00 indicates the "no profile" value, 80
indicates the constraints of the Baseline profile and 0d indicates
level 1.3. When level 2.1 is supported, the profile-level-id
becomes 008015.
If no profile-level-id is present, level 1 MUST be implied, i.e.
equivalent to profile-level-id 00800a.
max-mbps, max-fs, max-cpb, max-dpb, and max-br: These parameters MAY
be used to signal the capabilities of a receiver implementation.
These parameters MUST NOT be used for any other purpose. The
profile-level-id parameter MUST be present in the same receiver
capability description that contains any of these parameters. The
level conveyed in the value of the profile-level-id parameter MUST
be such that the receiver is fully capable of supporting. max-
mbps, max-fs, max-cpb, max- dpb, and max-br MAY be used to
indicate capabilities of the receiver that extend the required
capabilities of the signaled level, as specified below.
When more than one parameter from the set (max- mbps, max-fs, max-
cpb, max-dpb, max-br) is present, the receiver MUST support all
signaled capabilities simultaneously. For example, if both max-
mbps and max-br are present, the signaled level with the extension
of both the frame rate and bit rate is supported. That is, the
receiver is able to decode NAL unit streams in which the
macroblock processing rate is up to max-mbps (inclusive), the bit
rate is up to max-br (inclusive), the coded picture buffer size is
derived as specified in the semantics of the max-br parameter
below, and other properties comply with the level specified in the
value of the profile-level-id parameter.
A receiver MUST NOT signal values of max- mbps, max-fs, max-cpb,
max-dpb, and max-br that meet the requirements of a higher level,
referred to as level A herein, compared to the level specified in
the value of the profile- level-id parameter, if the receiver can
support all the properties of level A.
Informative note: When the OPTIONAL MIME type parameters are
used to signal the properties of a NAL unit stream, max-mbps,
max-fs, max-cpb, max-dpb, and max-br are not present, and the
value of profile- level-id must always be such that the NAL
unit stream complies fully with the specified profile and
level.
max-mbps: The value of max-mbps is an integer indicating the maximum
macroblock processing rate in units of macroblocks per second.
The max-mbps parameter signals that the receiver is capable of
decoding video at a higher rate than is required by the signaled
level conveyed in the value of the profile-level-id parameter.
When max-mbps is signaled, the receiver MUST be able to decode NAL
unit streams that conform to the signaled level, with the
exception that the MaxMBPS value in Table A-1 of H.264 [4] for the
signaled level is replaced with the value of max-mbps. The value
of max-mbps MUST be greater than or equal to the value of MaxMBPS
for the level given in Table A-1 of H.264 [4]. Senders MAY use
this knowledge to send pictures of a given size at a higher
picture rate than is indicated in the signaled level.
max-fs: The value of max-fs is an integer indicating the maximum
frame size in units of macroblocks. The max-fs parameter signals
that the receiver is capable of decoding larger picture sizes than
are required by the signaled level conveyed in the value of the
profile-level-id parameter. When max-fs is signaled, the receiver
MUST be able to decode NAL unit streams that conform to the
signaled level, with the exception that the MaxFS value in Table
A-1 of H.264 [4] for the signaled level is replaced with the value
of max-fs. The value of max-fs MUST be greater than or equal to
the value of MaxFS for the level given in Table A-1 of H.264 [4].
Senders MAY use this knowledge to send larger pictures at a
proportionally lower frame rate than is indicated in the signaled
level.
max-cpb: The value of max-cpb is an integer indicating the maximum
coded picture buffer size in units of 1000 bits for the VCL HRD
parameters (see A.3.1 item i of H.264 [4]) and in units of 1200
bits for the NAL HRD parameters (see A.3.1 item j of H.264 [4]).
The max-cpb parameter signals that the receiver has more memory
than the minimum amount of coded picture buffer memory required by
the signaled level conveyed in the value of the profile-level-id
parameter. When max-cpb is signaled, the receiver MUST be able to
decode NAL unit streams that conform to the signaled level, with
the exception that the MaxCPB value in Table A-1 of H.264 [4] for
the signaled level is replaced with the value of max-cpb. The
value of max-cpb MUST be greater than or equal to the value of
MaxCPB for the level given in Table A-1 of H.264 [4]. Senders MAY
use this knowledge to construct coded video streams with greater
variation of bit rate than can be achieved with the MaxCPB value
in Table A-1 of H.264 [4].
Informative note: The coded picture buffer is used in the
hypothetical reference decoder (Annex C) of H.264. The use of
the hypothetical reference decoder is recommended in H.264
encoders to verify that the produced bitstream conforms to the
standard and to control the output bitrate. Thus, the coded
picture buffer is conceptually independent of any other
potential buffers in the receiver, including de-interleaving
and de-jitter buffers. The coded picture buffer need not be
implemented in decoders as specified in Annex C of H.264, but
rather standard- compliant decoders can have any buffering
arrangements provided that they can decode standard-compliant
bitstreams. Thus, in practice, the input buffer for video
decoder can be integrated with de- interleaving and de-jitter
buffers of the receiver.
max-dpb: The value of max-dpb is an integer indicating the maximum
decoded picture buffer size in units of 1024 bytes. The max-dpb
parameter signals that the receiver has more memory than the
minimum amount of decoded picture buffer memory required by the
signaled level conveyed in the value of the profile-level-id
parameter. When max-dpb is signaled, the receiver MUST be able to
decode NAL unit streams that conform to the signaled level, with
the exception that the MaxDPB value in Table A-1 of H.264 [4] for
the signaled level is replaced with the value of max-dpb.
Consequently, a receiver that signals max-dpb MUST be capable of
storing the following number of decoded frames, complementary
field pairs, and non-paired fields in its decoded picture buffer:
Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs * 256 *
ChromaFormatFactor ), 16)
PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
defined in H.264 [4].
The value of max-dpb MUST be greater than or equal to the value of
MaxDPB for the level given in Table A-1 of H.264 [4]. Senders MAY
use this knowledge to construct coded video streams with improved
compression.
Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation
H.245, so as to facilitate signaling gateway designs. The
decoded picture buffer stores reconstructed samples and is a
property of the video decoder only. There is no relationship
between the size of the decoded picture buffer and the buffers
used in RTP, especially de-interleaving and de-jitter buffers.
max-br: The value of max-br is an integer indicating the maximum
video bit rate in units of 1000 bits per second for the VCL HRD
parameters (see A.3.1 item i of H.264 [4]) and in units of 1200
bits per second for the NAL HRD parameters (see A.3.1 item j of
H.264 [4]).
The max-br parameter signals that the video decoder of the
receiver is capable of decoding video at a higher bit rate than is
required by the signaled level conveyed in the value of the
profile-level-id parameter. The value of max- br MUST be greater
than or equal to the value of MaxBR for the level given in Table
A-1 of H.264 [4].
When max-br is signaled, the video codec of the receiver MUST be
able to decode NAL unit streams that conform to the signaled
level, conveyed in the profile-level-id parameter, with the
following exceptions in the limits specified by the level: o The
value of max-br replaces the MaxBR value of the signaled level (in
Table A-1 of H.264 [4]). o When the max-cpb parameter is not
present, the result of the following formula replaces the value of
MaxCPB in Table A-1 of H.264 [4]: (MaxCPB of the signaled level) *
max-br / (MaxBR of the signaled level).
For example, if a receiver signals capability for Level 1.2 with
max-br equal to 1550, this indicates a maximum video bitrate of
1550 kbits/sec for VCL HRD parameters, a maximum video bitrate of
1860 kbits/sec for NAL HRD parameters, and a CPB size of 4036458
bits (1550000 / 384000 * 1000 * 1000).
The value of max-br MUST be greater than or equal to the value
MaxBR for the signaled level given in Table A-1 of H.264 [4].
Senders MAY use this knowledge to send higher bitrate video as
allowed in the level definition of Annex A of H.264, to achieve
improved video quality.
Informative note: This parameter was added primarily to
complement a similar codepoint in the ITU-T Recommendation
H.245, so as to facilitate signaling gateway designs. No
assumption can be made from the value of this parameter that
the network is capable of handling such bit rates at any given
time. In particular, no conclusion can be drawn that the
signaled bit rate is possible under congestion control
constraints.
redundant-pic-cap: This parameter signals the capabilities of a
receiver implementation. When equal to 0, the parameter indicates
that the receiver makes no attempt to use redundant coded pictures
to correct incorrectly decoded primary coded pictures. When equal
to 0, the receiver is not capable of using redundant slices;
therefore, a sender SHOULD avoid sending redundant slices to save
bandwidth. When equal to 1, the receiver is capable of decoding
any such redundant slice that covers a corrupted area in a primary
decoded picture (at least partly), and therefore a sender MAY send
redundant slices. When the parameter is not present, then a value
of 0 MUST be used for redundant-pic-cap. When present, the value
of redundant-pic-cap MUST be either 0 or 1.
When the profile-level-id parameter is present in the same
capability signaling as the redundant-pic-cap parameter, and the
profile indicated in profile-level-id is such that it disallows
the use of redundant coded pictures (e.g., Main Profile), the
value of redundant- pic-cap MUST be equal to 0. When a receiver
indicates redundant-pic-cap equal to 0, the received stream SHOULD
NOT contain redundant coded pictures.
Informative note: Even if redundant-pic-cap is equal to 0, the
decoder is able to ignore redundant codec pictures provided
that the decoder supports such a profile (Baseline, Extended)
in which redundant coded pictures are allowed.
Informative note: Even if redundant-pic-cap is equal to 1, the
receiver may also choose other error concealment strategies to
replace or complement decoding of redundant slices.
sprop-parameter-sets: This parameter MAY be used to convey any
sequence and picture parameter set NAL units (herein referred to
as the initial parameter set NAL units) that MUST precede any
other NAL units in decoding order. The parameter MUST NOT be used
to indicate codec capability in any capability exchange procedure.
The value of the parameter is the base64 RFC 3548 [9]
representation of the initial parameter set NAL units as specified
in sections 7.3.2.1 and 7.3.2.2 of H.264 [4]. The parameter sets
are conveyed in decoding order, and no framing of the parameter
set NAL units takes place. A comma is used to separate any pair
of parameter sets in the list. Note that the number of bytes in a
parameter set NAL unit is typically less than 10, but a picture
parameter set NAL unit can contain several hundreds of bytes.
Informative note: When several payload types are offered in the
SDP Offer/Answer model, each with its own sprop-parameter- sets
parameter, then the receiver cannot assume that those parameter
sets do not use conflicting storage locations (i.e., identical
values of parameter set identifiers). Therefore, a receiver
should double-buffer all sprop-parameter-sets and make them
available to the decoder instance that decodes a certain
payload type.
parameter-add: This parameter MAY be used to signal whether the
receiver of this parameter is allowed to add parameter sets in its
signaling response using the sprop-parameter-sets MIME parameter.
The value of this parameter is either 0 or 1. 0 is equal to false;
i.e., it is not allowed to add parameter sets. 1 is equal to true;
i.e., it is allowed to add parameter sets. If the parameter is
not present, its value MUST be 1.
packetization-mode: This parameter signals the properties of an RTP
payload type or the capabilities of a receiver implementation.
Only a single configuration point can be indicated; thus, when
capabilities to support more than one packetization-mode are
declared, multiple configuration points (RTP payload types) must
be used.
When the value of packetization-mode is equal to 0 or
packetization-mode is not present, the single NAL mode, as defined
in section 6.2 of RFC 3984, MUST be used. This mode is in use in
standards using ITU-T Recommendation H.241 [5] (see section 12.1).
When the value of packetization-mode is equal to 1, the non-
interleaved mode, as defined in section 6.3 of RFC 3984, MUST be
used. When the value of packetization-mode is equal to 2, the
interleaved mode, as defined in section 6.4 of RFC 3984, MUST be
used. The value of packetization mode MUST be an integer in the
range of 0 to 2, inclusive.
sprop-interleaving-depth: This parameter MUST NOT be present when
packetization-mode is not present or the value of packetization-
mode is equal to 0 or 1. This parameter MUST be present when the
value of packetization-mode is equal to 2.
This parameter signals the properties of a NAL unit stream. It
specifies the maximum number of VCL NAL units that precede any VCL
NAL unit in the NAL unit stream in transmission order and follow
the VCL NAL unit in decoding order. Consequently, it is
guaranteed that receivers can reconstruct NAL unit decoding order
when the buffer size for NAL unit decoding order recovery is at
least the value of sprop- interleaving-depth + 1 in terms of VCL
NAL units.
The value of sprop-interleaving-depth MUST be an integer in the
range of 0 to 32767, inclusive.
sprop-deint-buf-req: This parameter MUST NOT be present when
packetization-mode is not present or the value of packetization-
mode is equal to 0 or 1. It MUST be present when the value of
packetization-mode is equal to 2.
sprop-deint-buf-req signals the required size of the
deinterleaving buffer for the NAL unit stream. The value of the
parameter MUST be greater than or equal to the maximum buffer
occupancy (in units of bytes) required in such a deinterleaving
buffer that is specified in section 7.2 of RFC 3984. It is
guaranteed that receivers can perform the deinterleaving of
interleaved NAL units into NAL unit decoding order, when the
deinterleaving buffer size is at least the value of sprop-deint-
buf-req in terms of bytes.
The value of sprop-deint-buf-req MUST be an integer in the range
of 0 to 4294967295, inclusive.
Informative note: sprop-deint-buf-req indicates the required
size of the deinterleaving buffer only. When network jitter
can occur, an appropriately sized jitter buffer has to be
provisioned for as well.
deint-buf-cap: This parameter signals the capabilities of a receiver
implementation and indicates the amount of deinterleaving buffer
space in units of bytes that the receiver has available for
reconstructing the NAL unit decoding order. A receiver is able to
handle any stream for which the value of the sprop-deint-buf-req
parameter is smaller than or equal to this parameter.
If the parameter is not present, then a value of 0 MUST be used
for deint-buf-cap. The value of deint-buf-cap MUST be an integer
in the range of 0 to 4294967295, inclusive.
Informative note: deint-buf-cap indicates the maximum possible
size of the deinterleaving buffer of the receiver only. When
network jitter can occur, an appropriately sized jitter buffer
has to be provisioned for as well.
sprop-init-buf-time: This parameter MAY be used to signal the
properties of a NAL unit stream. The parameter MUST NOT be
present, if the value of packetization-mode is equal to 0 or 1.
The parameter signals the initial buffering time that a receiver
MUST buffer before starting decoding to recover the NAL unit
decoding order from the transmission order. The parameter is the
maximum value of (transmission time of a NAL unit - decoding time
of the NAL unit), assuming reliable and instantaneous
transmission, the same timeline for transmission and decoding, and
that decoding starts when the first packet arrives.
An example of specifying the value of sprop- init-buf-time
follows. A NAL unit stream is sent in the following interleaved
order, in which the value corresponds to the decoding time and the
transmission order is from left to right:
0 2 1 3 5 4 6 8 7 ...
Assuming a steady transmission rate of NAL units, the transmission
times are:
0 1 2 3 4 5 6 7 8 ...
Subtracting the decoding time from the transmission time column-
wise results in the following series:
0 -1 1 0 -1 1 0 -1 1 ...
Thus, in terms of intervals of NAL unit transmission times, the
value of sprop-init-buf-time in this example is 1.
The parameter is coded as a non-negative base10 integer
representation in clock ticks of a 90- kHz clock. If the
parameter is not present, then no initial buffering time value is
defined. Otherwise the value of sprop-init- buf-time MUST be an
integer in the range of 0 to 4294967295, inclusive.
In addition to the signaled sprop-init-buf- time, receivers SHOULD
take into account the transmission delay jitter buffering,
including buffering for the delay jitter caused by mixers,
translators, gateways, proxies, traffic-shapers, and other network
elements.
sprop-max-don-diff: This parameter MAY be used to signal the
properties of a NAL unit stream. It MUST NOT be used to signal
transmitter or receiver or codec capabilities. The parameter MUST
NOT be present if the value of packetization-mode is equal to 0 or
1. sprop-max-don-diff is an integer in the range of 0 to 32767,
inclusive. If sprop-max-don-diff is not present, the value of the
parameter is unspecified. sprop-max- don-diff is calculated as
follows:
sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
for any i and any j>i,
where i and j indicate the index of the NAL unit in the
transmission order and AbsDON denotes a decoding order number of
the NAL unit that does not wrap around to 0 after 65535. In other
words, AbsDON is calculated as follows: Let m and n be consecutive
NAL units in transmission order. For the very first NAL unit in
transmission order (whose index is 0), AbsDON(0) = DON(0). For
other NAL units, AbsDON is calculated as follows:
If DON(m) == DON(n), AbsDON(n) = AbsDON(m)
If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)
If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))
If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))
where DON(i) is the decoding order number of the NAL unit having
index i in the transmission order. The decoding order number is
specified in section 5.5 of RFC 3984.
Informative note: Receivers may use sprop- max-don-diff to
trigger which NAL units in the receiver buffer can be passed to
the decoder.
max-rcmd-nalu-size: This parameter MAY be used to signal the
capabilities of a receiver. The parameter MUST NOT be used for
any other purposes. The value of the parameter indicates the
largest NALU size in bytes that the receiver can handle
efficiently. The parameter value is a recommendation, not a
strict upper boundary. The sender MAY create larger NALUs but
must be aware that the handling of these may come at a higher cost
than NALUs conforming to the limitation.
The value of max-rcmd-nalu-size MUST be an integer in the range of
0 to 4294967295, inclusive. If this parameter is not specified,
no known limitation to the NALU size exists. Senders still have
to consider the MTU size available between the sender and the
receiver and SHOULD run MTU discovery for this purpose.
This parameter is motivated by, for example, an IP to H.223 video
telephony gateway, where NALUs smaller than the H.223 transport
data unit will be more efficient. A gateway may terminate IP;
thus, MTU discovery will normally not work beyond the gateway.
Informative note: Setting this parameter to a lower than
necessary value may have a negative impact.
Encoding considerations: This type is only defined for transfer via
RTP (RFC 3550) and is framed and binary, see section 4.8 in
RFC4288.
Security considerations: See section X of RFC XXXX.
Interoperability considerations: None
Published specification: RFC XXXX and its reference section.
Applications that use this media type: None
Additional information: None
Magic number(s):
File extension(s):
Macintosh file type code(s):
Person & email address to contact for further information:
Tom Kristensen <tom.kristensen at tandberg.com>, <tomkri at ifi.uio.no>
Intended usage: COMMON
Restrictions on usage: This media type depends on RTP framing, and
hence is only defined for transfer via RTP, ref RFC3550.
Transport within other framing protocols is not defined at this
time.
Author: Tom Kristensen
Change controller: IETF Audio/Video Transport working group
delegated from the IESG.
More information about the Ietf-types
mailing list