Saturday, May 18, 2024

MPEG news: a report from the 146th meeting

 This blog post is based on the MPEG press release and has been modified/updated here to focus on and highlight research aspects. This version of the blog post will also be posted at ACM SIGMM Records.


The 146th MPEG meeting was held in Rennes, France from 22-26 April 2024, and the official press release can be found here. It comprises the following highlights:
  • AI-based Point Cloud Coding*: Call for proposals focusing on AI-driven point cloud encoding for applications such as immersive experiences and autonomous driving.
  • Object Wave Compression*: Call for interest in object wave compression for enhancing computer holography transmission.
  • Open Font Format: Committee Draft of the fifth edition, overcoming previous limitations like the 64K glyph encoding constraint.
  • Scene Description: Ratified second edition, integrating immersive media objects and extending support for various data types.
  • MPEG Immersive Video (MIV): New features in the second edition, enhancing the compression of immersive video content.
  • Video Coding Standards: New editions of AVC, HEVC, and Video CICP, incorporating additional SEI messages and extended multiview profiles.
  • Machine-Optimized Video Compression*: Advancement in optimizing video encoders for machine analysis.
  • MPEG-I Immersive Audio*: Reached Committee Draft stage, supporting high-quality, real-time interactive audio rendering for VR/AR/MR.
  • Video-based Dynamic Mesh Coding (V-DMC)*: Committee Draft status for efficiently storing and transmitting dynamic 3D content.
  • LiDAR Coding*: Enhanced efficiency and responsiveness in LiDAR data processing with the new standard reaching Committee Draft status.
* ... covered in this column.

AI-based Point Cloud Coding

MPEG issued a Call for Proposals (CfP) on AI-based point cloud coding technologies as a result from ongoing explorations regarding use cases, requirements, and the capabilities of AI-driven point cloud encoding, particularly for dynamic point clouds.

With recent significant progress in AI-based point cloud compression technologies, MPEG is keen on studying and adopting AI methodologies. MPEG is specifically looking for learning-based codecs capable of handling a broad spectrum of dynamic point clouds, which are crucial for applications ranging from immersive experiences to autonomous driving and navigation. As the field evolves rapidly, MPEG expects to receive multiple innovative proposals. These may include a unified codec, capable of addressing multiple types of point clouds, or specialized codecs tailored to meet specific requirements, contingent upon demonstrating clear advantages. MPEG has therefore publicly called for submissions of AI-based point cloud codecs, aimed at deepening the understanding of the various options available and their respective impacts. Submissions that meet the requirements outlined in the call will be invited to provide source code for further analysis, potentially laying the groundwork for a new standard in AI-based point cloud coding. MPEG welcomes all relevant contributions and looks forward to evaluating the responses.

Research aspects: In-depth analysis of algorithms, techniques, and methodologies, including a comparative study of various AI-driven point cloud compression techniques to identify the most effective approaches. Other aspects include creating or improving learning-based codecs that can handle dynamic point clouds as well as metrics for evaluating the performance of these codecs in terms of compression efficiency, reconstruction quality, computational complexity, and scalability. Finally, the assessment of how improved point cloud compression can enhance user experiences would be worthwhile to consider here also.

Object Wave Compression

A Call for Interest (CfI) in object wave compression has been issued by MPEG. Computer holography, a 3D display technology, utilizes a digital fringe pattern called a computer-generated hologram (CGH) to reconstruct 3D images from input 3D models. Holographic near-eye displays (HNEDs) reduce the need for extensive pixel counts due to their wearable design, positioning the display near the eye. This positions HNEDs as frontrunners for the early commercialization of computer holography, with significant research underway for product development. Innovative approaches facilitate the transmission of object wave data, crucial for CGH calculations, over networks. Object wave transmission offers several advantages, including independent treatment from playback device optics, lower computational complexity, and compatibility with video coding technology. These advancements open doors for diverse applications, ranging from entertainment experiences to real- time two-way spatial transmissions, revolutionizing fields such as remote surgery and virtual collaboration. As MPEG explores object wave compression for computer holography transmission, a Call for Interest seeks contributions to address market needs in this field.

Research aspects: Apart from compression efficiency, lower computation complexity, and compatibility with video coding technology, there is a range of research aspects, including the design, implementation, and evaluation of coding algorithms within the scope of this CfI. The QoE of computer-generated holograms (CGHs) together with holographic near-eye displays (HNEDs) is yet another dimension to be explored.

Machine-Optimized Video Compression

MPEG started working on a technical report regarding to the "Optimization of Encoders and Receiving Systems for Machine Analysis of Coded Video Content". In recent years, the efficacy of machine learning-based algorithms in video content analysis has steadily improved. However, an encoder designed for human consumption does not always produce compressed video conducive to effective machine analysis. This challenge lies not in the compression standard but in optimizing the encoder or receiving system. The forthcoming technical report addresses this gap by showcasing technologies and methods that optimize encoders or receiving systems to enhance machine analysis performance.

Research aspects: Video (and audio) coding for machines has been recently addressed by MPEG Video and Audio working groups, respectively. MPEG Joint Video Experts Team with ITU-T SG16, also known as JVET, joined this space with a technical report, but research aspects remain unchanged, i.e., coding efficiency, metrics, and quality aspects for machine analysis of compressed/coded video content.

MPEG-I Immersive Audio

MPEG Audio Coding enters the "immersive space" with MPEG-I immersive audio and its corresponding reference software. The MPEG-I immersive audio standard sets a new benchmark for compact and lifelike audio representation in virtual and physical spaces, catering to Virtual, Augmented, and Mixed Reality (VR/AR/MR) applications. By enabling high-quality, real-time interactive rendering of audio content with six degrees of freedom (6DoF), users can experience immersion, freely exploring 3D environments while enjoying dynamic audio. Designed in accordance with MPEG's rigorous standards, MPEG-I immersive audio ensures efficient distribution across bandwidth-constrained networks without compromising on quality. Unlike proprietary frameworks, this standard prioritizes interoperability, stability, and versatility, supporting both streaming and downloadable content while seamlessly integrating with MPEG-H 3D audio compression. MPEG-I's comprehensive modeling of real-world acoustic effects, including sound source properties and environmental characteristics, guarantees an authentic auditory experience. Moreover, its efficient rendering algorithms balance computational complexity with accuracy, empowering users to finely tune scene characteristics for desired outcomes.

Research aspects: Evaluating QoE of MPEG-I immersive audio-enabled environments as well as the efficient audio distribution across bandwidth-constrained networks without compromising on audio quality are two important research aspects to be addressed by the research community.

Video-based Dynamic Mesh Coding (V-DMC)

Video-based Dynamic Mesh Compression (V-DMC) represents a significant advancement in 3D content compression, catering to the ever-increasing complexity of dynamic meshes used across various applications, including real-time communications, storage, free-viewpoint video, augmented reality (AR), and virtual reality (VR). The standard addresses the challenges associated with dynamic meshes that exhibit time-varying connectivity and attribute maps, which were not sufficiently supported by previous standards. Video-based Dynamic Mesh Compression promises to revolutionize how dynamic 3D content is stored and transmitted, allowing more efficient and realistic interactions with 3D content globally.

Research aspects: V-DMC aims to allow "more efficient and realistic interactions with 3D content", which are subject to research, i.e., compression efficiency vs. QoE in constrained networked environments.

Low Latency, Low Complexity LiDAR Coding

Low Latency, Low Complexity LiDAR Coding underscores MPEG's commitment to advancing coding technologies required by modern LiDAR applications across diverse sectors. The new standard addresses critical needs in the processing and compression of LiDAR-acquired point clouds, which are integral to applications ranging from automated driving to smart city management. It provides an optimized solution for scenarios requiring high efficiency in both compression and real-time delivery, responding to the increasingly complex demands of LiDAR data handling. LiDAR technology has become essential for various applications that require detailed environmental scanning, from autonomous vehicles navigating roads to robots mapping indoor spaces. The Low Latency, Low Complexity LiDAR Coding standard will facilitate a new level of efficiency and responsiveness in LiDAR data processing, which is critical for the real-time decision-making capabilities needed in these applications. This standard builds on comprehensive analysis and industry feedback to address specific challenges such as noise reduction, temporal data redundancy, and the need for region-based quality of compression. The standard also emphasizes the importance of low latency coding to support real-time applications, essential for operational safety and efficiency in dynamic environments.

Research aspects: This standard effectively tackles the challenge of balancing high compression efficiency with real-time capabilities, addressing these often conflicting goals. Researchers may carefully consider these aspects and make meaningful contributions.

The 147th MPEG meeting will be held in Sapporo, Japan, from July 15-19, 2024. Click here for more information about MPEG meetings and their developments.

Friday, May 17, 2024

MPEG news: a report from the 145th meeting

This blog post is based on the MPEG press release and has been modified/updated here to focus on and highlight research aspects. This version of the blog post will also be posted at ACM SIGMM Records.


The 145th MPEG meeting was held online from 22-26 January 2024, and the official press release can be found here. It comprises the following highlights:
  • Latest Edition of the High Efficiency Image Format Standard Unveils Cutting-Edge Features for Enhanced Image Decoding and Annotation
  • MPEG Systems finalizes Standards supporting Interoperability Testing
  • MPEG finalizes the Third Edition of MPEG-D Dynamic Range Control
  • MPEG finalizes the Second Edition of MPEG-4 Audio Conformance
  • MPEG Genomic Coding extended to support Transport and File Format for Genomic Annotations
  • MPEG White Paper: Neural Network Coding (NNC) – Efficient Storage and Inference of Neural Networks for Multimedia Applications
This column will focus on the High Efficiency Image Format (HEIF) and interoperability testing. As usual, a brief update on MPEG-DASH et al. will be provided.

High Efficiency Image Format (HEIF)

The High Efficiency Image Format (HEIF) is a widely adopted standard in the imaging industry that continues to grow in popularity. At the 145th MPEG meeting, MPEG Systems (WG 3) ratified its third edition, which introduces exciting new features, such as progressive decoding capabilities that enhance image quality through a sequential, single-decoder instance process. With this enhancement, users can decode bitstreams in successive steps, with each phase delivering perceptible improvements in image quality compared to the preceding step. Additionally, the new edition introduces a sophisticated data structure that describes the spatial configuration of the camera and outlines the unique characteristics responsible for generating the image content. The update also includes innovative tools for annotating specific areas in diverse shapes, adding a layer of creativity and customization to image content manipulation. These annotation features cater to the diverse needs of users across various industries.

Research aspects: Progressive coding has been a part of modern image coding formats for some time now. However, the inclusion of supplementary metadata provides an opportunity to explore new use cases that can benefit both user experience (UX) and quality of experience (QoE) in academic settings.

Interoperability Testing

MPEG standards typically comprise format definitions (or specifications) to enable interoperability among products and services from different vendors. Interestingly, MPEG goes beyond these format specifications and provides reference software and conformance bitstreams, allowing conformance testing.

At the 145th MPEG meeting, MPEG Systems (WG 3) finalized two standards comprising conformance and reference software by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. The finalized standards, ISO/IEC 23090-24 and ISO/IEC 23090-25, showcase the pinnacle of conformance and reference software for scene description and visual volumetric video-based coding data, respectively.

ISO/IEC 23090-24 focuses on conformance and reference software for scene description, providing a comprehensive reference implementation and bitstream tailored for conformance testing related to ISO/IEC 23090-14, scene description. This standard opens new avenues for advancements in scene depiction technologies, setting a new standard for conformance and software reference in this domain.

Similarly, ISO/IEC 23090-25 targets conformance and reference software for the carriage of visual volumetric video-based coding data. With a dedicated reference implementation and bitstream, this standard is poised to elevate the conformance testing standards for ISO/IEC 23090-10, the carriage of visual volumetric video-based coding data. The introduction of this standard is expected to have a transformative impact on the visualization of volumetric video data.

At the same 145th MPEG meeting, MPEG Audio Coding (WG6) celebrated the completion of the second edition of ISO/IEC 14496-26, audio conformance, elevating it to the Final Draft International Standard (FDIS) stage. This significant update incorporates seven corrigenda and five amendments into the initial edition, originally published in 2010.

ISO/IEC 14496-26 serves as a pivotal standard, providing a framework for designing tests to ensure the compliance of compressed data and decoders with the requirements outlined in ISO/IEC 14496-3 (MPEG-4 Audio). The second edition reflects an evolution of the original, addressing key updates and enhancements through diligent amendments and corrigenda. This latest edition, now at the FDIS stage, marks a notable stride in MPEG Audio Coding's commitment to refining audio conformance standards and ensuring the seamless integration of compressed data within the MPEG-4 Audio framework.

These standards will be made freely accessible for download on the official ISO website, ensuring widespread availability for industry professionals, researchers, and enthusiasts alike.

Research aspects: Reference software and conformance bitstreams often serve as the basis for further research (and development) activities and, thus, are highly appreciated. For example, reference software of video coding formats (e.g., HM for HEVC, VM for VVC) can be used as a baseline when improving coding efficiency or other aspects of the coding format.

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below.
MPEG-DASH Status, January 2024.

The following most notable aspects have been discussed at the 145th MPEG meeting and adopted into ISO/IEC 23009-1, which will eventually become the 6th edition of the MPEG-DASH standard:
  • It is now possible to pass CMCD parameters sid and cid via the MPD URL.
  • Segment duration patterns can be signaled using SegmentTimeline.
  • Definition of a background mode of operation, which allows a DASH player to receive MPD updates and listen to events without possibly decrypting or rendering any media.
Additionally, the technologies under consideration (TuC) document has been updated with means to signal maximum segment rate, extend copyright license signaling, and improve haptics signaling in DASH. Finally, REAP is progressing towards FDIS but not yet there and most details will be discussed in the upcoming AhG period.

The 146th MPEG meeting will be held in Rennes, France, from April 22-26, 2024. Click here for more information about MPEG meetings and their developments.


Thursday, May 16, 2024

Assistant Professor (postdoc) with QA option (tenure track) (all genders welcome)

Department of Information Technology 

Scientific Staff  | Full time

Application deadline: 12 June 2024

Reference code: 673/23 [URL]

The University of Klagenfurt, with approximately 1,500 employees and over 12,000 students, is located in the Alps-Adriatic region and consistently achieves excellent placements in rankings. The motto “per aspera ad astra” underscores our firm commitment to the pursuit of excellence in all activities in research, teaching, and university management. The principles of equality, diversity, health, sustainability, and compatibility of work and family life serve as the foundation for our work at the university.

The University of Klagenfurt is pleased to announce the following open position at the Department of Information Technology at the Faculty of Technical Sciences with an expected starting date of 7 January 2025:

Assistant Professor (postdoc) with QA option (tenure track) (all genders welcome)

Level of employment: 100 % (40 hours/week)

Minimum salary: € 66,532.20 per annum (gross), Classification according to collective agreement: B1 lit.b

Limited to: 6 years (with the option of transitioning to a permanent contract)

Application deadline: 12 June 2024

Reference code: 673/23

Area of responsibility

  • Independent research in computer science and communication technologies with the aim of habilitation
  • Independent delivery of courses in English and German using established and innovative methods
  • Participation in the research and teaching projects run by the organisational unit
  • Acquisition and management of third-party funded projects
  • Supervision of students at Bachelor, Master, and doctoral levels
  • Participation in organisational and administrative tasks and in quality assurance measures
  • Contribution to expanding the international scientific and cultural contacts of the organisational unit
  • Participation in public relations activities including third mission

Requirements

  • Doctoral degree in the field of computer science, information and communications engineering, electrical engineering or related fields completed at a domestic or foreign higher education institution
  • Relevant and good publication record in the field of multimedia systems
  • A strong background in one or both fields
    • (Distributed) multimedia systems, preferably covering video in the context of video coding, communication, streaming, and quality of experience (QoE);
    • Machine learning, preferably in the context of (distributed) multimedia systems or/and computer vision
  • Very good scientific communication and dissemination skills (scientific writing and oral presentations)
  • Excellent programming skills in multimedia systems or/and machine learning
  • Excellent spoken and written English skills

Desired skills

  • Experience in the acquisition and running of third-party funded projects and readiness to play an active role in third-party funded projects and their acquisition
  • Didactic competence and proven successful teaching experience
  • Willingness to actively participate in research, teaching, and administration
  • Scientific curiosity and enthusiasm for imparting knowledge
  • Gender mainstreaming and diversity management skills
  • Leadership and teamwork skills
  • Good spoken and written German skills

Additional information

Our offer:

This tenure track position includes the option of negotiating a qualification agreement in accordance with Section 27 of the collective agreement for university staff for the areas of research, independent teaching, management and administrative tasks, and experience gained externally (QA). The employment contract is concluded for the position as Assistant Professor (postdoc) with QA option and stipulates a starting salary of € 4,752.30 gross per month (14 times a year; previous experience deemed relevant to the job can be recognised in accordance with the collective agreement). Upon entering into the qualification agreement, the position shall be classified as an Assistant Professorship with a minimum gross salary of € 5,595.60 per month. Upon fulfilling the stipulations of the qualification agreement, the post-holder shall be promoted to tenured Associate Professor with a minimum gross salary of € 6,055.70 per month.

The University of Klagenfurt also offers:

  • Personal and professional advanced training courses, management, and career coaching
  • Numerous attractive additional benefits, see also https://jobs.aau.at/en/the-university-as-employer/
  • Diversity- and family-friendly university culture
  • The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature, and sports

The application:

If you are interested in this position, please apply in German or English, providing a convincing application including the following:

  • Letter of application, including – but not limited to – motivation as well as a concise research and teaching statement, respectively
  • Curriculum vitae, including publication and lecture lists, as well as details and an explanation of research and teaching activities (please do not include a photo)

Furthermore:

  • Proof of all completed higher education programmes (certificates, supplements, if applicable)
  • Outline of the content of the doctoral programme (listing academic achievements, intermediate examinations, etc.) as well as the content of the thesis (summary)
  • Other documentary evidence that may be relevant to this announcement (see prerequisites and desired qualifications)
  • Please provide three references (contact details of persons who the university may contact by telephone for information purposes)

To apply, please select the position with the reference code 673/23 in the category “Scientific Staff” using the link “Apply for this position” in the job portal at https://jobs.aau.at/en/. >>> LINK <<<

Candidates must furnish proof that they meet the required qualifications by 12 June 2024 at the latest.

For further information on this specific vacancy, please contact Prof. Christian Timmerer (christian.timmerer@aau.at). General information about the university as an employer can be found at https://jobs.aau.at/en/the-university-as-employer/. At the University of Klagenfurt, recruitment and staff matters are accompanied not only by the authority responsible for the recruitment procedure but also by the Equal Opportunities Working Group and, if necessary, by the Representative for Disabled Persons.

The University of Klagenfurt aims to increase the proportion of women and therefore specifically invites qualified women to apply for the position. Where the qualification is equivalent, women will be given preferential consideration.

As part of its human resources policy, the University of Klagenfurt places particular emphasis on anti-discrimination, equal opportunities, and diversity.

People with disabilities or chronic diseases, who fulfil the requirements, are particularly encouraged to apply.

Travel and accommodation costs incurred during the application process will not be refunded.

Translations into other languages shall serve informational purposes only. Solely the version advertised in the University Bulletin (Mitteilungsblatt) shall be legally binding. 

Wednesday, January 17, 2024

Streaming week in Denver: MOQ interim + Mile-High Video + SVTA Segments

The next Media over QUIC (MOQ) interim meeting will be hosted by Comcast in Denver (Feb. 6-8). It is open to public participation and it is free. Details are here: https://github.com/moq-wg/wg-materials/blob/main/interim-24-02/arrangements.md

Then, the ACM Mile-High Video conference will be just a few miles away (including a Latency Party during the Super Bowl) between Feb. 11-14. Details are here: https://www.mile-high.video/technical-program


Finally, SVTA Segments 2024 will take place at the same venue on Feb. 14th. Details are here: https://segments2024.svta.org/


You can benefit from the early (and combo) registration rates for Mile-High Video and Segments.



Thursday, December 7, 2023

Hat-Trick Victory: MPEG-DASH Papers Shine in ACM SIGMM Test of Time Awards

The Association for Computing Machinery (ACM) Special Interest Group on Multimedia (SIGMM) provides a Test of Time Award. The details for this award can be found here, and I took the liberty to copy the main aspects as follows.

"This award is presented every year, starting in 2020, to the authors of the paper published either 10, 11 or 12 years previously at an SIGMM sponsored or co-sponsored conference (so the 2020 award would be for papers at a 2008, 2009 or 2010 SIGMM conference). The award recognizes the paper that has had the most impact and influence on the field of Multimedia in terms of research, development, product or ideas, during the intervening years, as selected by a selection committee. The contributions the selection committee will focus on may be theoretical advances, techniques and/or software tools that have been widely used, and/or innovative applications that have had impact on multimedia computing."

Interestingly, in the past three years, papers related to MPEG-DASH were always among the winners or honorable mentions as follows:

2021 Awards

Winner (MM Systems & Networking)

Thomas Stockhammer. 2011. Dynamic adaptive streaming over HTTP --: standards and design principles. In Proceedings of the second annual ACM conference on Multimedia systems (MMSys '11). Association for Computing Machinery, New York, NY, USA, 133–144. https://dl.acm.org/doi/10.1145/1943552.1943572

Abstract: In this paper, we provide some insight and background into the Dynamic Adaptive Streaming over HTTP (DASH) specifications as available from 3GPP and in draft version also from MPEG. Specifically, the 3GPP version provides a normative description of a Media Presentation, the formats of a Segment, and the delivery protocol. In addition, it adds an informative description on how a DASH Client may use the provided information to establish a streaming service for the user. The solution supports different service types (e.g., On-Demand, Live, Time-Shift Viewing), different features (e.g., adaptive bitrate switching, multiple language support, ad insertion, trick modes, DRM) and different deployment options. Design principles and examples are provided.

2022 Awards

Honorable Mention, in the category of “Multimedia Systems- Networks”

Saamer Akhshabi, Ali C. Begen, and Constantine Dovrolis. 2011. An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP. In Proceedings of the second annual ACM conference on Multimedia systems (MMSys '11). Association for Computing Machinery, New York, NY, USA, 157–168. https://dl.acm.org/doi/10.1145/1943552.1943574

Abstract: Adaptive (video) streaming over HTTP is gradually being adopted, as it offers significant advantages in terms of both user-perceived quality and resource utilization for content and network service providers. In this paper, we focus on the rate-adaptation mechanisms of adaptive streaming and experimentally evaluate two major commercial players (Smooth Streaming, Netflix) and one open source player (OSMF). Our experiments cover three important operating conditions. First, how does an adaptive video player react to either persistent or short-term changes in the underlying network available bandwidth. Can the player quickly converge to the maximum sustainable bitrate? Second, what happens when two adaptive video players compete for available bandwidth in the bottleneck link? Can they share the resources in a stable and fair manner? And third, how does adaptive streaming perform with live content? Is the player able to sustain a short playback delay? We identify major differences between the three players, and significant inefficiencies in each of them.

2023 Awards

Honorable Mention, in the category of "MM Systems & Networking"

Stefan Lederer, Christopher Müller, and Christian Timmerer. 2012. Dynamic adaptive streaming over HTTP dataset. In Proceedings of the 3rd Multimedia Systems Conference (MMSys '12). Association for Computing Machinery, New York, NY, USA, 89–94. https://dl.acm.org/doi/10.1145/2155555.2155570

Abstract: The delivery of audio-visual content over the Hypertext Transfer Protocol (HTTP) got lot of attention in recent years and with dynamic adaptive streaming over HTTP (DASH) a standard is now available. Many papers cover this topic and present their research results, but unfortunately all of them use their own private dataset which -- in most cases -- is not publicly available. Hence, it is difficult to compare, e.g., adaptation algorithms in an objective way due to the lack of a common dataset which shall be used as basis for such experiments. In this paper, we present our DASH dataset including our DASHEncoder, an open source DASH content generation tool. We also provide basic evaluations of the different segment lengths, the influence of HTTP server settings, and, in this context, we show some of the advantages as well as problems of shorter segment lengths.

Tuesday, December 5, 2023

A Tutorial on Immersive Video Delivery: From Omnidirectional Video to Holography

 A Tutorial on Immersive Video Delivery: From Omnidirectional Video to Holography

IEEE Communications Surveys and Tutorials

[PDF]

Jeroen van der Hooft (Ghent University, Belgium), Hadi Amirpour (AAU, Austria), Maria Torres Vega (KU Leuven, Belgium), Yago Sanchez (Fraunhofer/HHI), Raimund Schatz (AIT, Austria), Thomas Schierl (Fraunhofer/HHI, Germany), and Christian Timmerer (AAU, Austria)

Abstract: Video services are evolving from traditional two-dimensional video to virtual reality and holograms, which offer six degrees of freedom to users, enabling them to freely move around in a scene and change focus as desired. However, this increase in freedom translates into stringent requirements in terms of ultra-high bandwidth (in the order of Gigabits per second) and minimal latency (in the order of milliseconds). To realize such immersive services, the network transport, as well as the video representation and encoding, have to be fundamentally enhanced. The purpose of this tutorial article is to provide an elaborate introduction to the creation, streaming, and evaluation of immersive video. Moreover, it aims to provide lessons learned and to point at promising research paths to enable truly interactive immersive video applications toward holography.

Keywords—Immersive video delivery, 3DoF, 6DoF, omnidirectional video, volumetric video, point clouds, meshes, light fields, holography, end-to-end systems

J. van der Hooft, H. Amirpour, M. Torres Vega, Y. Sanchez, R. Schatz, T. Schierl, C. Timmerer, "A Tutorial on Immersive Video Delivery: From Omnidirectional Video to Holography," in IEEE Communications Surveys & Tutorials, vol. 25, no. 2, pp. 1336-1375, Secondquarter 2023, doi: 10.1109/COMST.2023.3263252.

Tuesday, November 28, 2023

MPEG news: a report from the 144th meeting

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.

MPEG News Archive

The 144th MPEG meeting was held in Hannover, Germany! For those interested, the press release with all the details is available. It’s great to see progress being made in person (cf. also the group pictures below).

Attendees of the 144th MPEG meeting in Hannover, Germany.
Attendees of the 144th MPEG meeting in Hannover, Germany.

The main outcome of this meeting is as follows:

  • MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
  • MPEG evaluates Call for Proposals on Feature Compression for Video Coding for Machines
  • MPEG progresses ISOBMFF-related Standards for the Carriage of Network Abstraction Layer Video Data
  • MPEG enhances the Support of Energy-Efficient Media Consumption
  • MPEG ratifies the Support of Temporal Scalability for Geometry-based Point Cloud Compression
  • MPEG reaches the First Milestone for the Interchange of 3D Graphics Formats
  • MPEG announces Completion of Coding of Genomic Annotations

We have modified the press release to cater to the readers of ACM SIGMM Records and highlighted research on video technologies. This edition of the MPEG column focuses on MPEG Systems-related standards and visual quality assessment. As usual, the column will end with an update on MPEG-DASH.

Visual Quality Assessment

MPEG does not create standards in the visual quality assessment domain. However, it conducts visual quality assessments for its standards during various stages of the standardization process. For instance, it evaluates responses to call for proposals, conducts verification tests of its final standards, and so on. MPEG Visual Quality Assessment (AG 5) issued an open call to study quality assessment for learning-based video codecs. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies have focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. To facilitate the study of visual quality, MPEG maintains the Compressed Video for the study of Quality Metrics (CVQM) dataset.

With the recent advancements in learning-based video compression algorithms, MPEG is now studying compression using these codecs. It is expected that reconstructed videos compressed using learning-based codecs will have different types of distortion compared to those induced by traditional block-based motion-compensated video coding designs. To gain a deeper understanding of these distortions and their impact on visual quality, MPEG has issued a public call related to learning-based video codecs. MPEG is open to inputs in response to the call and will invite responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.

Considering the rapid advancements in the development of learning-based video compression algorithms, MPEG will keep this call open and anticipates future updates to the call.

Interested parties are kindly requested to contact the MPEG AG 5 Convenor Mathias Wien (wien@lfb.rwth- aachen.de) and submit responses for review at the 145th MPEG meeting in January 2024. Further details are given in the call, issued as AG 5 document N 104 and available from the mpeg.org website.

Research aspects: Learning-based data compression (e.g., for image, audio, video content) is a hot research topic. Research on this topic relies on datasets offering a set of common test sequences, sometimes also common test conditions, that are publicly available and allow for comparison across different schemes. MPEG’s Compressed Video for the study of Quality Metrics (CVQM) dataset is such a dataset, available here, and ready to be used also by researchers and scientists outside of MPEG. The call mentioned above is open for everyone inside/outside of MPEG and allows researchers to participate in international standards efforts (note: to attend meetings, one must become a delegate of a national body).

MPEG Systems-related Standards

At the 144th MPEG meeting, MPEG Systems (WG 3) produced three news-worthy items as follows:

  • Progression of ISOBMFF-related standards for the carriage of Network Abstraction Layer (NAL) video data.
  • Enhancement of the support of energy-efficient media consumption.
  • Support of temporal scalability for geometry-based Point Cloud Compression (PPC).

ISO/IEC 14496-15, a part of the family of ISOBMFF-related standards, defines the carriage of Network Abstract Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). This standard has been further improved with the approval of the Final Draft Amendment (FDAM), which adds support for enhanced features such as Picture-in-Picture (PiP) use cases enabled by VVC.

In addition to the improvements made to ISO/IEC 14496-15, separately developed amendments have been consolidated in the 7th edition of the standard. This edition has been promoted to Final Draft International Standard (FDIS), marking the final milestone of the formal standard development.

Another important standard in development is the 2nd edition of ISO/IEC14496-32 (file format reference software and conformance). This standard, currently at the Committee Draft (CD) stage of development, is planned to be completed and reach the status of Final Draft International Standard (FDIS) by the beginning of 2025. This standard will be essential for industry professionals who require a reliable and standardized method of verifying the conformance of their implementation.

MPEG Systems (WG 3) also promoted ISO/IEC 23001-11 (energy-efficient media consumption (green metadata)) Amendment 1 to Final Draft Amendment (FDAM). This amendment introduces energy-efficient media consumption (green metadata) for Essential Video Coding (EVC) and defines metadata that enables a reduction in decoder power consumption. At the same time, ISO/IEC 23001-11 Amendment 2 has been promoted to the Committee Draft Amendment (CDAM) stage of development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025.

Finally, MPEG Systems (WG 3) promoted ISO/IEC 23090-18 (carriage of geometry-based point cloud compression data) Amendment 1 to Final Draft Amendment (FDAM). This amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 (geometry-based point cloud compression) and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files. This enables support for applications that require multiple frame rates within a single file and introduces a track grouping mechanism to indicate multiple tracks carrying a specific temporal layer of a single elementary stream separately.

Research aspects: MPEG Systems usually provides standards on top of existing compression standards, enabling efficient storage and delivery of media data (among others). Researchers may use these standards (including reference software and conformance bitstreams) to conduct research in the general area of multimedia systems (cf. ACM MMSys) or, specifically on green multimedia systems (cf. ACM GMSys).

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below with only minor updates compared to the last meeting.

MPEG-DASH Status, October 2023.

In particular, the 6th edition of MPEG-DASH is scheduled for 2024 but may not include all amendments under development. An overview of existing amendments can be found in the blog post from the last meeting. Current amendments have been (slightly) updated and progressed toward completion in the upcoming meetings. The signaling of haptics in DASH has been discussed and accepted for inclusion in the Technologies under Consideration (TuC) document. The TuC document comprises candidate technologies for possible future amendments to the MPEG-DASH standard and is publicly available here.

Research aspects: MPEG-DASH has been heavily researched in the multimedia systems, quality, and communications research communities. Adding haptics to MPEG-DASH would provide another dimension worth considering within research, including, but not limited to, performance aspects and Quality of Experience (QoE).

The 145th MPEG meeting will be online from January 22-26, 2024. Click here for more information about MPEG meetings and their developments.