Registration of media type video/H264-RCDO

Tue Oct 6 10:46:59 CEST 2009

Last week we submitted a new version of
draft-ietf-avt-rtp-h264-rcdo:
  http://tools.ietf.org/html/draft-ietf-avt-rtp-h264-rcdo-03

In this version of the draft the complete media type registration was included.

The media subtype for RCDO for H.264 is allocated from the IETF tree.

The media type registration is attached to this message, for convenience.

Cheers,
-- Tom

-------------- next part --------------
   Type name: video

   Subtype name: H264-RCDO

   Required parameters:

   rate:  Indicates the RTP timestamp clock rate.  The rate value MUST
      be 90000.

   Optional parameters:

   profile-level-id:  A base16 RFC 3548 [9] (hexadecimal) representation
      of the following three bytes in the sequence parameter set NAL
      unit specified in H.264 [4]: 1) profile_idc, 2) a byte herein
      referred to as profile-iop, composed of the values of
      constraint_set0_flag, constraint_set1_flag, constraint_set2_flag,
      and reserved_zero_5bits in bit-significance order, starting from
      the most significant bit, and 3) level_idc.

      RCDO is distinct from any profile, this implies that the profile
      value 0 (no profile) and the profile_idc byte of the profile-
      level-id parameter are equal to 0.  An RCDO bitstream MUST obey to
      all the constraints of the Baseline profile.  Therefore, only
      constraint_set0_flag is equal to 1 in the profile-iop part of the
      profile-level-id parameter, the remaining bits are set to 0.

      If the profile-level-id parameter is used to indicate properties
      of a NAL unit stream, it indicates the level that a decoder has to
      support in order to comply with H.264 [4] when it decodes the
      stream.  If the profile-level-id parameter is used for capability
      exchange or session setup procedure, it indicates the highest
      level supported for the signaled profile.

      For example, if a codec supports level 2.1, the profile-level-id
      becomes 00800d, in which 00 indicates the "no profile" value, 80
      indicates the constraints of the Baseline profile and 0d indicates
      level 1.3.  When level 2.1 is supported, the profile-level-id
      becomes 008015.

      If no profile-level-id is present, level 1 MUST be implied, i.e.
      equivalent to profile-level-id 00800a.

   max-mbps, max-fs, max-cpb, max-dpb, and max-br:  These parameters MAY
      be used to signal the capabilities of a receiver implementation.
      These parameters MUST NOT be used for any other purpose.  The
      profile-level-id parameter MUST be present in the same receiver
      capability description that contains any of these parameters.  The
      level conveyed in the value of the profile-level-id parameter MUST
      be such that the receiver is fully capable of supporting. max-
      mbps, max-fs, max-cpb, max- dpb, and max-br MAY be used to
      indicate capabilities of the receiver that extend the required
      capabilities of the signaled level, as specified below.

      When more than one parameter from the set (max- mbps, max-fs, max-
      cpb, max-dpb, max-br) is present, the receiver MUST support all
      signaled capabilities simultaneously.  For example, if both max-
      mbps and max-br are present, the signaled level with the extension
      of both the frame rate and bit rate is supported.  That is, the
      receiver is able to decode NAL unit streams in which the
      macroblock processing rate is up to max-mbps (inclusive), the bit
      rate is up to max-br (inclusive), the coded picture buffer size is
      derived as specified in the semantics of the max-br parameter
      below, and other properties comply with the level specified in the
      value of the profile-level-id parameter.

      A receiver MUST NOT signal values of max- mbps, max-fs, max-cpb,
      max-dpb, and max-br that meet the requirements of a higher level,
      referred to as level A herein, compared to the level specified in
      the value of the profile- level-id parameter, if the receiver can
      support all the properties of level A.

         Informative note: When the OPTIONAL MIME type parameters are
         used to signal the properties of a NAL unit stream, max-mbps,
         max-fs, max-cpb, max-dpb, and max-br are not present, and the
         value of profile- level-id must always be such that the NAL
         unit stream complies fully with the specified profile and
         level.

   max-mbps:  The value of max-mbps is an integer indicating the maximum
      macroblock processing rate in units of macroblocks per second.
      The max-mbps parameter signals that the receiver is capable of
      decoding video at a higher rate than is required by the signaled
      level conveyed in the value of the profile-level-id parameter.
      When max-mbps is signaled, the receiver MUST be able to decode NAL
      unit streams that conform to the signaled level, with the
      exception that the MaxMBPS value in Table A-1 of H.264 [4] for the
      signaled level is replaced with the value of max-mbps.  The value
      of max-mbps MUST be greater than or equal to the value of MaxMBPS
      for the level given in Table A-1 of H.264 [4].  Senders MAY use
      this knowledge to send pictures of a given size at a higher
      picture rate than is indicated in the signaled level.

   max-fs:  The value of max-fs is an integer indicating the maximum
      frame size in units of macroblocks.  The max-fs parameter signals
      that the receiver is capable of decoding larger picture sizes than
      are required by the signaled level conveyed in the value of the
      profile-level-id parameter.  When max-fs is signaled, the receiver
      MUST be able to decode NAL unit streams that conform to the
      signaled level, with the exception that the MaxFS value in Table
      A-1 of H.264 [4] for the signaled level is replaced with the value
      of max-fs.  The value of max-fs MUST be greater than or equal to
      the value of MaxFS for the level given in Table A-1 of H.264 [4].
      Senders MAY use this knowledge to send larger pictures at a
      proportionally lower frame rate than is indicated in the signaled
      level.

   max-cpb:  The value of max-cpb is an integer indicating the maximum
      coded picture buffer size in units of 1000 bits for the VCL HRD
      parameters (see A.3.1 item i of H.264 [4]) and in units of 1200
      bits for the NAL HRD parameters (see A.3.1 item j of H.264 [4]).
      The max-cpb parameter signals that the receiver has more memory
      than the minimum amount of coded picture buffer memory required by
      the signaled level conveyed in the value of the profile-level-id
      parameter.  When max-cpb is signaled, the receiver MUST be able to
      decode NAL unit streams that conform to the signaled level, with
      the exception that the MaxCPB value in Table A-1 of H.264 [4] for
      the signaled level is replaced with the value of max-cpb.  The
      value of max-cpb MUST be greater than or equal to the value of
      MaxCPB for the level given in Table A-1 of H.264 [4].  Senders MAY
      use this knowledge to construct coded video streams with greater
      variation of bit rate than can be achieved with the MaxCPB value
      in Table A-1 of H.264 [4].

         Informative note: The coded picture buffer is used in the
         hypothetical reference decoder (Annex C) of H.264.  The use of
         the hypothetical reference decoder is recommended in H.264
         encoders to verify that the produced bitstream conforms to the
         standard and to control the output bitrate.  Thus, the coded
         picture buffer is conceptually independent of any other
         potential buffers in the receiver, including de-interleaving
         and de-jitter buffers.  The coded picture buffer need not be
         implemented in decoders as specified in Annex C of H.264, but
         rather standard- compliant decoders can have any buffering
         arrangements provided that they can decode standard-compliant
         bitstreams.  Thus, in practice, the input buffer for video
         decoder can be integrated with de- interleaving and de-jitter
         buffers of the receiver.

   max-dpb:  The value of max-dpb is an integer indicating the maximum
      decoded picture buffer size in units of 1024 bytes.  The max-dpb
      parameter signals that the receiver has more memory than the
      minimum amount of decoded picture buffer memory required by the
      signaled level conveyed in the value of the profile-level-id
      parameter.  When max-dpb is signaled, the receiver MUST be able to
      decode NAL unit streams that conform to the signaled level, with
      the exception that the MaxDPB value in Table A-1 of H.264 [4] for
      the signaled level is replaced with the value of max-dpb.
      Consequently, a receiver that signals max-dpb MUST be capable of
      storing the following number of decoded frames, complementary
      field pairs, and non-paired fields in its decoded picture buffer:

      Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs * 256 *
      ChromaFormatFactor ), 16)

      PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
      defined in H.264 [4].

      The value of max-dpb MUST be greater than or equal to the value of
      MaxDPB for the level given in Table A-1 of H.264 [4].  Senders MAY
      use this knowledge to construct coded video streams with improved
      compression.

         Informative note: This parameter was added primarily to
         complement a similar codepoint in the ITU-T Recommendation
         H.245, so as to facilitate signaling gateway designs.  The
         decoded picture buffer stores reconstructed samples and is a
         property of the video decoder only.  There is no relationship
         between the size of the decoded picture buffer and the buffers
         used in RTP, especially de-interleaving and de-jitter buffers.

   max-br:  The value of max-br is an integer indicating the maximum
      video bit rate in units of 1000 bits per second for the VCL HRD
      parameters (see A.3.1 item i of H.264 [4]) and in units of 1200
      bits per second for the NAL HRD parameters (see A.3.1 item j of
      H.264 [4]).

      The max-br parameter signals that the video decoder of the
      receiver is capable of decoding video at a higher bit rate than is
      required by the signaled level conveyed in the value of the
      profile-level-id parameter.  The value of max- br MUST be greater
      than or equal to the value of MaxBR for the level given in Table
      A-1 of H.264 [4].

      When max-br is signaled, the video codec of the receiver MUST be
      able to decode NAL unit streams that conform to the signaled
      level, conveyed in the profile-level-id parameter, with the
      following exceptions in the limits specified by the level: o The
      value of max-br replaces the MaxBR value of the signaled level (in
      Table A-1 of H.264 [4]). o When the max-cpb parameter is not
      present, the result of the following formula replaces the value of
      MaxCPB in Table A-1 of H.264 [4]: (MaxCPB of the signaled level) *
      max-br / (MaxBR of the signaled level).

      For example, if a receiver signals capability for Level 1.2 with
      max-br equal to 1550, this indicates a maximum video bitrate of
      1550 kbits/sec for VCL HRD parameters, a maximum video bitrate of
      1860 kbits/sec for NAL HRD parameters, and a CPB size of 4036458
      bits (1550000 / 384000 * 1000 * 1000).

      The value of max-br MUST be greater than or equal to the value
      MaxBR for the signaled level given in Table A-1 of H.264 [4].

      Senders MAY use this knowledge to send higher bitrate video as
      allowed in the level definition of Annex A of H.264, to achieve
      improved video quality.

         Informative note: This parameter was added primarily to
         complement a similar codepoint in the ITU-T Recommendation
         H.245, so as to facilitate signaling gateway designs.  No
         assumption can be made from the value of this parameter that
         the network is capable of handling such bit rates at any given
         time.  In particular, no conclusion can be drawn that the
         signaled bit rate is possible under congestion control
         constraints.

   redundant-pic-cap:  This parameter signals the capabilities of a
      receiver implementation.  When equal to 0, the parameter indicates
      that the receiver makes no attempt to use redundant coded pictures
      to correct incorrectly decoded primary coded pictures.  When equal
      to 0, the receiver is not capable of using redundant slices;
      therefore, a sender SHOULD avoid sending redundant slices to save
      bandwidth.  When equal to 1, the receiver is capable of decoding
      any such redundant slice that covers a corrupted area in a primary
      decoded picture (at least partly), and therefore a sender MAY send
      redundant slices.  When the parameter is not present, then a value
      of 0 MUST be used for redundant-pic-cap.  When present, the value
      of redundant-pic-cap MUST be either 0 or 1.

      When the profile-level-id parameter is present in the same
      capability signaling as the redundant-pic-cap parameter, and the
      profile indicated in profile-level-id is such that it disallows
      the use of redundant coded pictures (e.g., Main Profile), the
      value of redundant- pic-cap MUST be equal to 0.  When a receiver
      indicates redundant-pic-cap equal to 0, the received stream SHOULD
      NOT contain redundant coded pictures.

         Informative note: Even if redundant-pic-cap is equal to 0, the
         decoder is able to ignore redundant codec pictures provided
         that the decoder supports such a profile (Baseline, Extended)
         in which redundant coded pictures are allowed.

         Informative note: Even if redundant-pic-cap is equal to 1, the
         receiver may also choose other error concealment strategies to
         replace or complement decoding of redundant slices.

   sprop-parameter-sets:  This parameter MAY be used to convey any
      sequence and picture parameter set NAL units (herein referred to
      as the initial parameter set NAL units) that MUST precede any
      other NAL units in decoding order.  The parameter MUST NOT be used
      to indicate codec capability in any capability exchange procedure.
      The value of the parameter is the base64 RFC 3548 [9]
      representation of the initial parameter set NAL units as specified
      in sections 7.3.2.1 and 7.3.2.2 of H.264 [4].  The parameter sets
      are conveyed in decoding order, and no framing of the parameter
      set NAL units takes place.  A comma is used to separate any pair
      of parameter sets in the list.  Note that the number of bytes in a
      parameter set NAL unit is typically less than 10, but a picture
      parameter set NAL unit can contain several hundreds of bytes.

         Informative note: When several payload types are offered in the
         SDP Offer/Answer model, each with its own sprop-parameter- sets
         parameter, then the receiver cannot assume that those parameter
         sets do not use conflicting storage locations (i.e., identical
         values of parameter set identifiers).  Therefore, a receiver
         should double-buffer all sprop-parameter-sets and make them
         available to the decoder instance that decodes a certain
         payload type.

   parameter-add:  This parameter MAY be used to signal whether the
      receiver of this parameter is allowed to add parameter sets in its
      signaling response using the sprop-parameter-sets MIME parameter.
      The value of this parameter is either 0 or 1. 0 is equal to false;
      i.e., it is not allowed to add parameter sets. 1 is equal to true;
      i.e., it is allowed to add parameter sets.  If the parameter is
      not present, its value MUST be 1.

   packetization-mode:  This parameter signals the properties of an RTP
      payload type or the capabilities of a receiver implementation.
      Only a single configuration point can be indicated; thus, when
      capabilities to support more than one packetization-mode are
      declared, multiple configuration points (RTP payload types) must
      be used.

      When the value of packetization-mode is equal to 0 or
      packetization-mode is not present, the single NAL mode, as defined
      in section 6.2 of RFC 3984, MUST be used.  This mode is in use in
      standards using ITU-T Recommendation H.241 [5] (see section 12.1).
      When the value of packetization-mode is equal to 1, the non-
      interleaved mode, as defined in section 6.3 of RFC 3984, MUST be
      used.  When the value of packetization-mode is equal to 2, the
      interleaved mode, as defined in section 6.4 of RFC 3984, MUST be
      used.  The value of packetization mode MUST be an integer in the
      range of 0 to 2, inclusive.

   sprop-interleaving-depth:  This parameter MUST NOT be present when
      packetization-mode is not present or the value of packetization-
      mode is equal to 0 or 1.  This parameter MUST be present when the
      value of packetization-mode is equal to 2.

      This parameter signals the properties of a NAL unit stream.  It
      specifies the maximum number of VCL NAL units that precede any VCL
      NAL unit in the NAL unit stream in transmission order and follow
      the VCL NAL unit in decoding order.  Consequently, it is
      guaranteed that receivers can reconstruct NAL unit decoding order
      when the buffer size for NAL unit decoding order recovery is at
      least the value of sprop- interleaving-depth + 1 in terms of VCL
      NAL units.

      The value of sprop-interleaving-depth MUST be an integer in the
      range of 0 to 32767, inclusive.

   sprop-deint-buf-req:  This parameter MUST NOT be present when
      packetization-mode is not present or the value of packetization-
      mode is equal to 0 or 1.  It MUST be present when the value of
      packetization-mode is equal to 2.

      sprop-deint-buf-req signals the required size of the
      deinterleaving buffer for the NAL unit stream.  The value of the
      parameter MUST be greater than or equal to the maximum buffer
      occupancy (in units of bytes) required in such a deinterleaving
      buffer that is specified in section 7.2 of RFC 3984.  It is
      guaranteed that receivers can perform the deinterleaving of
      interleaved NAL units into NAL unit decoding order, when the
      deinterleaving buffer size is at least the value of sprop-deint-
      buf-req in terms of bytes.

      The value of sprop-deint-buf-req MUST be an integer in the range
      of 0 to 4294967295, inclusive.

         Informative note: sprop-deint-buf-req indicates the required
         size of the deinterleaving buffer only.  When network jitter
         can occur, an appropriately sized jitter buffer has to be
         provisioned for as well.

   deint-buf-cap:  This parameter signals the capabilities of a receiver
      implementation and indicates the amount of deinterleaving buffer
      space in units of bytes that the receiver has available for
      reconstructing the NAL unit decoding order.  A receiver is able to
      handle any stream for which the value of the sprop-deint-buf-req
      parameter is smaller than or equal to this parameter.

      If the parameter is not present, then a value of 0 MUST be used
      for deint-buf-cap.  The value of deint-buf-cap MUST be an integer
      in the range of 0 to 4294967295, inclusive.

         Informative note: deint-buf-cap indicates the maximum possible
         size of the deinterleaving buffer of the receiver only.  When
         network jitter can occur, an appropriately sized jitter buffer
         has to be provisioned for as well.

   sprop-init-buf-time:  This parameter MAY be used to signal the
      properties of a NAL unit stream.  The parameter MUST NOT be
      present, if the value of packetization-mode is equal to 0 or 1.

      The parameter signals the initial buffering time that a receiver
      MUST buffer before starting decoding to recover the NAL unit
      decoding order from the transmission order.  The parameter is the
      maximum value of (transmission time of a NAL unit - decoding time
      of the NAL unit), assuming reliable and instantaneous
      transmission, the same timeline for transmission and decoding, and
      that decoding starts when the first packet arrives.

      An example of specifying the value of sprop- init-buf-time
      follows.  A NAL unit stream is sent in the following interleaved
      order, in which the value corresponds to the decoding time and the
      transmission order is from left to right:

      0 2 1 3 5 4 6 8 7 ...

      Assuming a steady transmission rate of NAL units, the transmission
      times are:

      0 1 2 3 4 5 6 7 8 ...

      Subtracting the decoding time from the transmission time column-
      wise results in the following series:

      0 -1 1 0 -1 1 0 -1 1 ...

      Thus, in terms of intervals of NAL unit transmission times, the
      value of sprop-init-buf-time in this example is 1.

      The parameter is coded as a non-negative base10 integer
      representation in clock ticks of a 90- kHz clock.  If the
      parameter is not present, then no initial buffering time value is
      defined.  Otherwise the value of sprop-init- buf-time MUST be an
      integer in the range of 0 to 4294967295, inclusive.

      In addition to the signaled sprop-init-buf- time, receivers SHOULD
      take into account the transmission delay jitter buffering,
      including buffering for the delay jitter caused by mixers,
      translators, gateways, proxies, traffic-shapers, and other network
      elements.

   sprop-max-don-diff:  This parameter MAY be used to signal the
      properties of a NAL unit stream.  It MUST NOT be used to signal
      transmitter or receiver or codec capabilities.  The parameter MUST
      NOT be present if the value of packetization-mode is equal to 0 or
      1. sprop-max-don-diff is an integer in the range of 0 to 32767,
      inclusive.  If sprop-max-don-diff is not present, the value of the
      parameter is unspecified. sprop-max- don-diff is calculated as
      follows:

      sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
      for any i and any j>i,

      where i and j indicate the index of the NAL unit in the
      transmission order and AbsDON denotes a decoding order number of
      the NAL unit that does not wrap around to 0 after 65535.  In other
      words, AbsDON is calculated as follows: Let m and n be consecutive
      NAL units in transmission order.  For the very first NAL unit in
      transmission order (whose index is 0), AbsDON(0) = DON(0).  For
      other NAL units, AbsDON is calculated as follows:

      If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

      If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
      AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

      If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
      AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

      If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),

      AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

      If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
      AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

      where DON(i) is the decoding order number of the NAL unit having
      index i in the transmission order.  The decoding order number is
      specified in section 5.5 of RFC 3984.

         Informative note: Receivers may use sprop- max-don-diff to
         trigger which NAL units in the receiver buffer can be passed to
         the decoder.

   max-rcmd-nalu-size:  This parameter MAY be used to signal the
      capabilities of a receiver.  The parameter MUST NOT be used for
      any other purposes.  The value of the parameter indicates the
      largest NALU size in bytes that the receiver can handle
      efficiently.  The parameter value is a recommendation, not a
      strict upper boundary.  The sender MAY create larger NALUs but
      must be aware that the handling of these may come at a higher cost
      than NALUs conforming to the limitation.

      The value of max-rcmd-nalu-size MUST be an integer in the range of
      0 to 4294967295, inclusive.  If this parameter is not specified,
      no known limitation to the NALU size exists.  Senders still have
      to consider the MTU size available between the sender and the
      receiver and SHOULD run MTU discovery for this purpose.

      This parameter is motivated by, for example, an IP to H.223 video
      telephony gateway, where NALUs smaller than the H.223 transport
      data unit will be more efficient.  A gateway may terminate IP;
      thus, MTU discovery will normally not work beyond the gateway.

         Informative note: Setting this parameter to a lower than
         necessary value may have a negative impact.

   Encoding considerations:  This type is only defined for transfer via
      RTP (RFC 3550) and is framed and binary, see section 4.8 in
      RFC4288.

   Security considerations:  See section X of RFC XXXX.

   Interoperability considerations:  None

   Published specification:  RFC XXXX and its reference section.

   Applications that use this media type:  None

   Additional information:  None

      Magic number(s):

      File extension(s):

      Macintosh file type code(s):

   Person & email address to contact for further information:
      Tom Kristensen <tom.kristensen at tandberg.com>, <tomkri at ifi.uio.no>

   Intended usage:  COMMON

   Restrictions on usage:  This media type depends on RTP framing, and
      hence is only defined for transfer via RTP, ref RFC3550.
      Transport within other framing protocols is not defined at this
      time.

   Author:  Tom Kristensen

   Change controller:  IETF Audio/Video Transport working group
      delegated from the IESG.