Saturday, January 14, 2023

MPEG news: a report from the 140th meeting

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.

After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:
  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding [1].

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:
  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.
Research aspects: the main research area is still the same as described in my last blog post, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.


Finally, I'd like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

No comments: