Multimedia Communication

Friday, November 28, 2025

MPEG news: a report from the 152nd meeting

This version of the blog post is also available at ACM SIGMM Records

The 152nd MPEG meeting took place in Geneva, Switzerland, from October 7 to October 11, 2025. The official MPEG press release can be found here. This column highlights key points from the meeting, amended with research aspects relevant to the ACM SIGMM community:

MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF). A separate press release regarding this achievement is available here.
JVET ratified new editions of VSEI, VVC, and HEVC
The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized
Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated

MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF)

On September 18, 2025, the National Academy of Television Arts & Sciences (NATAS) announced that the MPEG Systems Working Group (ISO/IEC JTC 1/SC 29/WG 3) had been selected as a recipient of a Technology & Engineering Emmy® Award for standardizing the Common Media Application Format (CMAF). But what is CMAF? CMAF (ISO/IEC 23000-19) is a media format standard designed to simplify and unify video streaming workflows across different delivery protocols and devices. Here’s a structured overview. Before CMAF, streaming services often had to produce multiple container formats, i.e., (i) ISO Base Media File Format (ISOBMFF) for MPEG-DASH and MPEG-2 Transport Stream (TS) for Apple HLS. This duplication resulted in additional encoding, packaging, and storage costs. I wrote a blog post about this some time ago here. CMAF’s main goal is to define a single, standardized segmented media format usable by both HLS and DASH, enabling “encode once, package once, deliver everywhere.”

The core concept of CMAF is that it is based on ISOBMFF, the foundation for MP4. Each CMAF stream consists of a CMAF header, CMAF media segments, and CMAF track files (a logical sequence of segments for one stream, e.g., video or audio). CMAF enables low-latency streaming by allowing progressive segment transfer, adopting chunked transfer encoding via CMAF chunks. CMAF defines interoperable profiles for codecs and presentation types for video, audio, and subtitles. Thanks to its compatibility with and adoption within existing streaming standards, CMAF bridges the gaps between DASH and HLS, creating a unified ecosystem.

Research aspects include – but are not limited to – low-latency tuning (segment/chunk size trade-offs, HTTP/3, QUIC), Quality of Experience (QoE) impact of chunk-based adaptation, synchronization of live and interactive CMAF streams, edge-assisted CMAF caching and prediction, and interoperability testing and compliance tools.

JVET ratified new editions of VSEI, VVC, and HEVC

At its 40th meeting, the Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) concluded the standardization work on the next editions of three key video coding standards, advancing them to the Final Draft International Standard (FDIS) stage. Corresponding twin-text versions have also been submitted to ITU-T for consent procedures. The finalized standards include:

Versatile Supplemental Enhancement Information (VSEI) — ISO/IEC 23002-7 | ITU-T Rec. H.274
Versatile Video Coding (VVC) — ISO/IEC 23090-3 | ITU-T Rec. H.266
High Efficiency Video Coding (HEVC) — ISO/IEC 23008-2 | ITU-T Rec. H.265

The primary focus of these new editions is the extension and refinement of Supplemental Enhancement Information (SEI) messages, which provide metadata and auxiliary data to support advanced processing, interpretation, and quality management of coded video streams.

The updated VSEI specification introduces both new and refined SEI message types supporting advanced use cases:

AI-driven processing: Extensions for neural-network-based post-filtering and film grain synthesis offer standardized signalling for machine learning components in decoding and rendering pipelines.
Semantic and multimodal content: New SEI messages describe infrared, X-ray, and other modality indicators, region packing, and object mask encoding; creating interoperability points for multimodal fusion and object-aware compression research.
Pipeline optimization: Messages defining processing order and post-processing nesting support research on joint encoder-decoder optimization and edge-cloud coordination in streaming architectures.
Authenticity and generative media: A new set of messages supports digital signature embedding and generative-AI-based face encoding, raising questions for the SIGMM community about trust, authenticity, and ethical AI in media pipelines.
Metadata and interpretability: New SEIs for text description, image format metadata, and AI usage restriction requests could facilitate research into explainable media, human-AI interaction, and regulatory compliance in multimedia systems.

All VSEI features are fully compatible with the new VVC edition, and most are also supported in HEVC. The new HEVC edition further refines its multi-view profiles, enabling more robust 3D and immersive video use cases.

Research aspects of these new standard’s editions can be summarized as follows: (i) Define new standardized interfaces between neural post-processing and conventional video coding, fostering reproducible and interoperable research on learned enhancement models. (ii) Encourage exploration of metadata-driven adaptation and QoE optimization using SEI-based signals in streaming systems. (iii) Open possibilities for cross-layer system research, connecting compression, transport, and AI-based decision layers. (iv) Introduce a formal foundation for authenticity verification, content provenance, and AI-generated media signalling, relevant to current debates on trustworthy multimedia.

These updates highlight how ongoing MPEG/ITU standardization is evolving toward a more AI-aware, multimodal, and semantically rich media ecosystem, providing fertile ground for experimental and applied research in multimedia systems, coding, and intelligent media delivery.

The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized

MPEG Coding of 3D Graphics and Haptics (ISO/IEC JTC 1/SC 29/WG7) has advanced MPEG-I Part 5 – Visual Volumetric Video-based Coding (V3C and V-PCC) to the Final Draft International Standard (FDIS) stage, marking its fourth edition. This revision introduces major updates to the Video-based Coding of Volumetric Content (V3C) framework, particularly enabling support for an additional bitstream instance: V-DMC (Video-based Dynamic Mesh Compression).

Previously, V3C served as the structural foundation for V-PCC (Video-based Point Cloud Compression) and MIV (MPEG Immersive Video). The new edition extends this flexibility by allowing V-DMC integration, reinforcing V3C as a generic, extensible framework for volumetric and 3D video coding. All instances follow a shared principle, i.e., using conventional 2D video codecs (e.g., HEVC, VVC) for projection-based compression, complemented by specialized tools for mapping, geometry, and metadata handling.

While V-PCC remains co-specified within Part 5, MIV (Part 12) and V-DMC (Part 29) are standardized separately. The progression to FDIS confirms the technical maturity and architectural stability of the framework.

This evolution opens new research directions as follows: (i) Unified 3D content representation, enabling comparative evaluation of point cloud, mesh, and view-based methods under one coding architecture. (ii) Efficient use of 2D codecs for 3D media, raising questions on mapping optimization, distortion modeling, and geometry-texture compression. (iii) Dynamic and interactive volumetric streaming, relevant to AR/VR, telepresence, and immersive communication research.

The fourth edition of MPEG-I Part 5 thus positions V3C as a cornerstone for future volumetric, AI-assisted, and immersive video systems, bridging standardization and cutting-edge multimedia research.

Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated

The Joint Video Experts Team (JVET, ISO/IEC JTC 1/SC 29/WG 5) has completed the evaluation of submissions to its Call for Evidence (CfE) on video compression with capability beyond VVC. The CfE investigated coding technologies that may surpass the performance of the current Versatile Video Coding (VVC) standard in compression efficiency, computational complexity, and extended functionality.

A total of five submissions were assessed, complemented by ECM16 reference encodings and VTM anchor sequences with multiple runtime variants. The evaluation addressed both compression capability and encoding runtime, as well as low-latency and error-resilience features. All technologies were derived from VTM, ECM, or NNVC frameworks, featuring modified encoder configurations and coding tools rather than entirely new architectures.

Key Findings

In the compression capability test, 76 out of 120 test cases showed at least one submission with a non-overlapping confidence interval compared to the VTM anchor. Several methods outperformed ECM16 in visual quality and achieved notable compression gains at lower complexity. Neural-network-based approaches demonstrated clear perceptual improvements, particularly for 8K HDR content, while gains were smaller for gaming scenarios.
In the encoding runtime test, significant improvements were observed even under strict complexity constraints: 37 of 60 test points (at both 1× and 0.2× runtime) showed statistically significant benefits over VTM. Some submissions achieved faster encoding than VTM, with only a 35% increase in decoder runtime.

Research Relevance and Outlook

The CfE results illustrate a maturing convergence between model-based and data-driven video coding, raising research questions highly relevant for the ACM SIGMM community:

How can learned prediction and filtering networks be integrated into standard codecs while preserving interoperability and runtime control?
What methodologies can best evaluate perceptual quality beyond PSNR, especially for HDR and immersive content?
How can complexity-quality trade-offs be optimized for diverse hardware and latency requirements?

Building on these outcomes, JVET is preparing a Call for Proposals (CfP) for the next-generation video coding standard, with a draft planned for early 2026 and evaluation through 2027. Upcoming activities include refining test material, adding Reference Picture Resampling (RPR), and forming a new ad hoc group on hardware implementation complexity.

For multimedia researchers, this CfE marks a pivotal step toward AI-assisted, complexity-adaptive, and perceptually optimized compression systems, which are considered a key frontier where codec standardization meets intelligent multimedia research.

The 153rd MPEG meeting will be held online from January 19 to January 23, 2026. Click here for more information about MPEG meetings and their developments.

Tuesday, October 14, 2025

Happy World Standards Day 2025!

Celebrating innovation, interoperability, and collaboration through international standards.

Every year on October 14, we celebrate World Standards Day — honoring the collective efforts of experts and organizations worldwide who develop and maintain the standards that make modern digital life possible. For the Moving Picture Experts Group (MPEG), this day marks decades of work in defining the technologies that power media, streaming, and immersive experiences worldwide.

A Year of Progress and New Milestones

Over the past year, MPEG and its working groups achieved remarkable progress across video, audio, systems, and AI-driven technologies — advancing the future of multimedia communication. Hot off the press, MPEG is proud to announce another Emmy® Technology & Engineering Award — this time for the Common Media Application Format (CMAF; ISO/IEC 23000-19), a landmark standard that brought long-awaited harmonization between DASH and HLS streaming formats (among others).

Next Generation Video Coding Beyond VVC

The Joint Video Experts Team (JVET), a joint effort of ISO/IEC and ITU-T, launched a Call for Evidence exploring technologies that go beyond Versatile Video Coding (VVC).

The goal: to identify breakthroughs that significantly improve compression efficiency, runtime performance, and functionality — from HDR and 8K video to gaming and user-generated content. Depending on the results, a Call for Proposals (CfP) for the next generation of video coding may follow in 2026, opening the door to AI-enhanced compression.

The current plan foresees a draft CfP in January 2026, followed by the final CfP in July 2026 and submissions in November 2026, with evaluations scheduled for January 2027. The first version of the resulting standard is expected to be finalized within three years thereafter.

MPEG-DASH (Sixth Edition)

Adaptive streaming continues to evolve, and the sixth edition of MPEG-DASH (ISO/IEC 23009-1) marks a major step forward. New features include enhanced low-latency streaming, content steering across multiple CDNs, compact signaling for faster playback, and even support for interactive storylines — enabling richer, more dynamic media experiences. MPEG-DASH remains the foundation of scalable, interoperable video streaming used by billions of devices worldwide.

AI and Machine-Oriented Coding

MPEG’s vision for Audio and Video Coding for Machines continues to take shape. The updated Call for Proposals on Audio Coding for Machines (ACoM) invites technologies for efficiently compressing audio and multi-dimensional signals — not only for human listening but also for machine learning and AI-driven analysis. In parallel, Video Coding for Machines (VCM) is being standardized to optimize visual data for computer vision and autonomous systems, reducing bitrate while preserving task-relevant features.

Open Font Format (Fifth Edition)

MPEG Systems (WG 3) reached the Final Draft International Standard (FDIS) stage for the fifth edition of the Open Font Format (ISO/IEC 14496-22). This major update removes previous technical constraints, supporting over 64K glyphs and the entire Unicode range in a single file — a leap toward more inclusive digital typography across languages and writing systems.

3D and Volumetric Media Innovation

From Video-Based Dynamic Mesh Coding (V-DMC) to Low Latency Point Cloud Compression (L3C2), MPEG advanced two pivotal 3D graphics standards to final draft status. These technologies support real-time 3D content — from immersive AR/VR experiences to LiDAR-based perception in autonomous vehicles — enabling efficient, low-latency, and interoperable volumetric media.

Ensuring Media Authenticity

New amendments to MPEG Audio standards introduce mechanisms for Media Authenticity, allowing verification of content integrity and provenance across audio, video, and system layers. This step is essential for a trustworthy digital media ecosystem.

Genomics and AI Meet Multimedia

MPEG also looked beyond traditional media: the MPEG-G Genomics Hackathon, co-organized with partners such as Stanford Medicine, Philips, and Fudan University, challenges researchers to apply AI to microbiome data encoded in MPEG-G format. The goal: uncover new biomedical insights through standard-based, interoperable data compression.

Looking Ahead

From next-generation video compression and AI-enhanced codecs to trustworthy media and adaptive streaming, MPEG continues to define the building blocks of interoperable multimedia. As new technologies reshape how we experience and analyze content, standards ensure that innovation remains open, efficient, and globally accessible.

On this World Standards Day, we celebrate the dedication of all MPEG experts and contributors for shaping a smarter, more connected multimedia future.

Learn more at www.mpeg.org and stay tuned for updates from the next MPEG meeting in early 2026.

Wednesday, July 16, 2025

Full Professor of Virtual and Augmented Reality (all genders welcome)

The official and legally binding job description is available here.

The University of Klagenfurt wants to attract more qualified women for professorships.

The University of Klagenfurt is pleased to announce the following open position in the Department of Information Technology (ITEC) within the Faculty of Technical Sciences, in compliance with the provisions of Art. 98 (open-ended) or Art. 99 (limited to 5 years) of the Austrian Universities Act:

Full Professor of Virtual and Augmented Reality (all genders welcome)

This is a full-time position. Whether the position will be implemented in compliance with the provisions of Art. 98 Austrian Universities Act (open-ended) or Art. 99 of the Austrian Universities Act (limited to 5 years) will be decided in the course of the appointment procedure.

The University of Klagenfurt is a young, vibrant, and innovative university, located at the intersection of Alpine and Mediterranean culture in an area that offers an exceptionally high quality of life. As a public university pursuant to Art. 6 of the Austrian Universities Act, it receives federal funding. The Times Higher Education (THE) Young University Rankings 2021 ranked it among the 50 best young universities in the world. The university operates under the motto “Beyond Boundaries!”.

In accordance with its key strategic road map, the development plan, the university’s primary guiding principles and objectives include the pursuit of scientific excellence regarding the appointment of professors, favourable research conditions, a good faculty-student ratio, and the promotion of the development of young scientists.

The professorship will be embedded in the Department of Information Technology (ITEC; https://itec.aau.at/) within the Faculty of Technical Sciences (https://www.aau.at/en/tewi), which focuses on distributed multimedia systems, including multimedia coding, transmission, and quality of experience, AI-based multimedia analysis, game studies and engineering, as well as distributed cloud and edge computing. The department and faculty provide a vivid, friendly, and research-oriented environment. We are looking for a highly qualified and internationally recognized scientist with high engagement in developing and sustaining an ambitious and innovative research programme.

Virtual and Augmented Reality (VR/AR) are broad research fields addressing both theoretical and application-driven questions. This position offers an opportunity to focus on cutting-edge VR/AR research areas including – but not limited to – immersive media (e.g., 360° videos, 3D point clouds), AI for object recognition in VR/AR (e.g., in industry and medicine), educational and training applications, computer graphics, sensor technology, human-computer interaction, and efficient multimedia data transmission and cloud/edge processing.

The professor will be involved in teaching in a variety of degree programmes, including the Bachelor’s programmes “Applied Informatics” and “Robotics and Artificial Intelligence”, and the international Master’s programmes “Informatics” and “Game Studies and Engineering”.

The duties of the position include:

Representing the field of Virtual and Augmented Reality in research and teaching
Acquiring and managing competitive research funding
Collaborating with colleagues across the university and with industry partners
Teaching in relevant Bachelor’s, Master’s, and Doctoral programmes
Advising and mentoring students and early career researchers
Contributing to the long-term development of the department and its international standing
Advancing the department’s and faculty’s research priorities, with a commitment to interdisciplinary collaboration
Contributing to university governance and academic self-administration
Engaging in third mission activities and public outreach

Required qualifications:

Habilitation or equivalent qualification in a relevant field
Excellent research standing and publication record in Virtual and/or Augmented Reality, including theoretical and technical foundations
Experience in the acquisition of competitive third-party funded research projects of a relevant volume
Teaching experience at university level and didactic competence
Experience in the (co-)supervision of academic theses
Collaboration and social skills
Fluency in English

Desired qualifications:

Excellent scientific communication and dissemination skills
Interdisciplinary experience
Experience with academic management duties
Competence in leadership and management of teams
Competence in gender mainstreaming and diversity management
Fluency in German

German language skills are not a formal prerequisite, but proficiency at level B2 is expected within two years. The remit of the professorship requires that the successful candidate will establish Klagenfurt as primary place of work.

The university is committed to increasing the number of women among the faculty, particularly in high-level positions, and therefore specifically invites applications from qualified women. Among equally qualified candidates, women will receive preferential consideration. People with disabilities or chronic diseases who meet the qualification criteria are explicitly invited to apply.

The salary is subject to negotiation. The minimum gross salary for the position at this level (salary group A1 for faculty according to the Austrian Universities’ Collective Bargaining Agreement) is currently € 92,500 per year.

In accordance with the Austrian Income Tax Act an attractive relocation tax allowance can be granted for the first five years in the case of appointments to professorships in Austria. The prerequisites are subject to examination on a case-by-case basis.

Please submit your application in English by e-mail to the University of Klagenfurt, Office of the Senate, attn. Mag.a (FH) Sabine Seebacher via application_professorship@aau.at no later than September 28, 2025, including:

a mandatory principal part not exceeding five pages https://jobs.aau.at/wp-content/uploads/specimen_main_part_application_professorship.doc). The submission of the mandatory principal part mentioned above constitutes a necessary condition for the validity of your application.
one single PDF including:

a letter of motivation
a detailed scientific CV
a comprehensive list of publications, talks, and courses taught
a list of acquired third-party funded research projects, including role, funding organization, and amount of funding (in case of funding acquired within a consortium, please specify the amount attributed to you)
a compact research statement of up to two pages
supplementary documents, where applicable (e.g., course evaluations)
links to publicly available versions of your three most important publications within the scope of this professorship

For general information, please refer to the general information on our website provided at https://jobs.aau.at/en/the-university-as-employer/. For specific information about the position, please contact Prof. Dr. Christian Timmerer (christian.timmerer@aau.at).

Wednesday, June 18, 2025

Up to 4 Predoc Scientist Positions (all genders welcome)

The University of Klagenfurt, with approximately 1,700 employees and over 13,000 students, is located in the Alps-Adriatic region and consistently achieves excellent placements in rankings. The motto “per aspera ad astra” underscores our firm commitment to the pursuit of excellence in all activities in research, teaching, and university management. The principles of equality, diversity, health, sustainability, and compatibility of work and family life serve as the foundation for our work at the university.

The University of Klagenfurt is in the process of establishing a Karl Popper Kolleg (graduate school) entitled “FruitScope: A DroneScope for Smart Agriculture”. The following positions are open for applicants at this school with an anticipated starting date of October 1, 2025:

Up to 4 Predoc Scientist Positions (all genders welcome)

Level of employment: 75 % (30 hours per week) each
Minimum salary: € 39,005.40 per annum (gross); classification according to collective bargaining agreement: B1
Limited to: 3 years
Application deadline: August 20, 2025
Reference code: 338/25

Tasks and responsibilities:

Independent research and scientific qualification within the Karl Popper Kolleg FruitScope with the aim to acquire the Doctoral Degree in Technical Sciences
Peer-reviewed publication of scientific results in journals and at conferences
Team work and student mentoring
Active participation in public relations activities

This graduate school seeks to push the current bounds of state-of-the-art in navigation, coordination, sensing, and communication of multi agent unmanned aerial vehicles (UAVs). The groups of the involved faculty publish in international top journals and conference proceedings. Successful applicants will be encouraged and supported to publish and present their work in such journals and proceedings and will have the opportunity to cooperate with our world-renowned international partners in science and industry. We currently cooperate with partners worldwide, mainly in the USA/Canada and Europe. We specifically encourage close and open collaboration with our peers both internationally and at the University and support international exchanges with the universities and research institutions affiliated to the graduate school (e.g., ETH Zurich, MIT, CMU, NASA, UofT, U-Mich, UPenn, Georgia Tech). Our young research groups provide a dynamic, familiar, and friendly attitude and thus a collaborative and inspiring work environment with very modern infrastructure (e.g., one of the largest indoor drone halls in Europe), which is continuously updated and upgraded (e.g., soon, with one of the largest outdoor drone test fields in the world).

Prerequisites for the appointment:

Completed Master’s or Diploma degree in electrical engineering, information and communication engineering, mechanical engineering, computer science or related fields. This requirement has an extended deadline and must be fulfilled two weeks before the starting date at the latest; hence, the last possible deadline for meeting this requirement is September 17, 2025.
Proven knowledge and experience in at least one of the following areas: mobile robotics, wireless communications or sensing, multimedia communication, signal processing for communications, or machine learning
Proven programming skills in at least one of the following languages: Matlab, C/C++, Java, Python, ROS or similar
Fluency in English (both written and spoken)

Additional desired qualifications:

Good knowledge of cooperative software development (e.g., with GIT)
First scientific publication (apart from Master’s or Diploma thesis) in the area of mobile robotics, wireless sensing, or multimedia communication technology
Relevant international or practical experience
Good scientific communication and presentation skills
German language skills or willingness to acquire German language skills within the first two years of service
Social skills and ability to work independently

Our offer:

The employment contract is concluded for the position as predoc scientist and stipulates a starting salary of € 2,786.10 gross per month (14 times a year; previous experience deemed relevant to the job can be recognized).

The University of Klagenfurt also offers:

Personal and professional advanced training courses, management and career coaching, including bespoke training for women in science
Numerous attractive additional benefits, see also https://jobs.aau.at/en/the-university-as-employer/
Diversity- and family-friendly university culture
The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature and sports

The application:

If you are interested in this position, please apply in English providing the following documents:

Letter of application explaining the motivation and including a statement of interest in research (indicating an idea for the research for your own doctoral degree)
Curriculum vitae (please do not include a photo)
Copies of degree certificates (Bachelor and Master)
Copies of official transcripts (Bachelor and Master) containing a list of all courses and grades
Master’s thesis. If the thesis is not available, the candidate should provide a draft or an explanation.
If an applicant has not received the Master’s degree by the application deadline, the applicant should provide a declaration, written either by a supervisor or by the candidate themselves, on the feasibility of finishing the Master’s degree before September 17, 2025.

To apply, please select the position with the reference code 338/25 in the category “Scientific Staff” using the link “Apply for this position” in the job portal at https://jobs.aau.at/en/.

Candidates must provide proof that they meet the required qualifications by August 20, 2025, at the latest. However, candidates who fulfil the required qualifications but do not yet possess the required Master’s degree can apply, provided they are able to meet this requirement at least two weeks before the starting date. Therefore, the latest possible deadline for meeting this requirement is September 17, 2025.

General information about the university as an employer can be found at https://jobs.aau.at/en/the-university-as-employer/. At the University of Klagenfurt, recruitment and staff matters are accompanied not only by the authority responsible for the recruitment procedure but also by the Equal Opportunities Working Group and, if applicable, by the Representative for Disabled Persons.

For further information on this specific vacancy, please contact:

Prof Dr. Stephan Weiss, +43 463 2700 3571, Stephan.Weiss@aau.at
Prof Dr. Christian Bettstetter, +43 463 2700 3640, Christian.Bettstetter@aau.at
Prof Dr. Bernhard Rinner, +43 463 2700 3671, Bernhard.Rinner@aau.at
Prof Dr. Christian Timmerer +43 463 2700 3621, Christian.Timmerer@aau.at

The University of Klagenfurt aims to increase the proportion of women and therefore specifically invites qualified women to apply for the position. Where the qualification is equivalent, women will be given preferential consideration.

People with disabilities or chronic diseases, who fulfil the requirements, are particularly encouraged to apply. Travel and accommodation costs incurred during the application process will not be refunded. Under exceptional circumstances online hearings may be possible. Translations into other languages serve informational purposes only. Solely the version advertised in the University Bulletin (Mitteilungsblatt) shall be legally binding.

Friday, May 9, 2025

MPEG news: a report from the 150th meeting

This version of the blog post is also available at ACM SIGMM Records.

MPEG News Archive

The 150th MPEG meeting was held online from 31 March to 04 April 2025. The official press release can be found here. This blog post provides the following highlights:

Requirements: MPEG-AI strategy and white paper on MPEG technologies for metaverse
JVET: Draft Joint Call for Evidence on video compression with capability beyond Versatile Video Coding (VVC)
Video: Gaussian splat coding and video coding for machines
Audio: Audio coding for machines
3DGH: 3D Gaussian splat coding

MPEG-AI Strategy

The MPEG-AI strategy envisions a future where AI and neural networks are deeply integrated into multimedia coding and processing, enabling transformative improvements in how digital content is created, compressed, analyzed, and delivered. By positioning AI at the core of multimedia systems, MPEG-AI seeks to enhance both content representation and intelligent analysis. This approach supports applications ranging from adaptive streaming and immersive media to machine-centric use cases like autonomous vehicles and smart cities. AI is employed to optimize coding efficiency, generate intelligent descriptors, and facilitate seamless interaction between content and AI systems. The strategy builds on foundational standards such as ISO/IEC 15938-13 (CDVS), 15938-15 (CDVA), and 15938-17 (Neural Network Coding), which collectively laid the groundwork for integrating AI into multimedia frameworks.

Currently, MPEG is developing a family of standards under the ISO/IEC 23888 series that includes a vision document, machine-oriented video coding, and encoder optimization for AI analysis. Future work focuses on feature coding for machines and AI-based point cloud compression to support high-efficiency 3D and visual data handling. These efforts reflect a paradigm shift from human-centric media consumption to systems that also serve intelligent machine agents. MPEG-AI maintains compatibility with traditional media processing while enabling scalable, secure, and privacy-conscious AI deployments. Through this initiative, MPEG aims to define the future of multimedia as an intelligent, adaptable ecosystem capable of supporting complex, real-time, and immersive digital experiences.

MPEG White Paper on Metaverse Technologies

The MPEG white paper on metaverse technologies (cf. MPEG white papers) outlines the pivotal role of MPEG standards in enabling immersive, interoperable, and high-quality virtual experiences that define the emerging metaverse. It identifies core metaverse parameters – real-time operation, 3D experience, interactivity, persistence, and social engagement – and maps them to MPEG’s longstanding and evolving technical contributions. From early efforts like MPEG-4’s Binary Format for Scenes (BIFS) and Animation Framework eXtension (AFX) to MPEG-V’s sensory integration, and the advanced MPEG-I suite, these standards underpin critical features such as scene representation, dynamic 3D asset compression, immersive audio, avatar animation, and real-time streaming. Key technologies like point cloud compression (V-PCC, G-PCC), immersive video (MIV), and dynamic mesh coding (V-DMC) demonstrate MPEG’s capacity to support realistic, responsive, and adaptive virtual environments. Recent efforts include neural network compression for learned scene representations (e.g., NeRFs), haptic coding formats, and scene description enhancements, all geared toward richer user engagement and broader device interoperability.

The document highlights five major metaverse use cases – virtual environments, immersive entertainment, virtual commerce, remote collaboration, and digital twins – all supported by MPEG innovations. It emphasizes the foundational role of MPEG-I standards (e.g., Parts 12, 14, 29, 39) for synchronizing immersive content, representing avatars, and orchestrating complex 3D scenes across platforms. Future challenges identified include ensuring interoperability across systems, advancing compression methods for AI-assisted scenarios, and embedding security and privacy protections. With decades of multimedia expertise and a future-focused standards roadmap, MPEG positions itself as a key enabler of the metaverse – ensuring that emerging virtual ecosystems are scalable, immersive, and universally accessible.

The MPEG white paper on metaverse technologies highlights several research opportunities, including efficient compression of dynamic 3D content (e.g., point clouds, meshes, neural representations), synchronization of immersive audio and haptics, real-time adaptive streaming, and scene orchestration. It also points to challenges in standardizing interoperable avatar formats, AI-enhanced media representation, and ensuring seamless user experiences across devices. Additional research directions include neural network compression, cross-platform media rendering, and developing perceptual metrics for immersive Quality of Experience (QoE).

Draft Joint Call for Evidence (CfE) on Video Compression beyond Versatile Video Coding (VVC)

The latest JVET AHG report on ECM software development (AHG6), documented as JVET-AL0006, shows promising results. Specifically, in the “Overall” row and “Y” column, there is a 27.06% improvement in coding efficiency compared to VVC, as shown in the figure below.

The Draft Joint Call for Evidence (CfE) on video compression beyond VVC (Versatile Video Coding), identified as document JVET-AL2026 | N 355, is being developed to explore new advancements in video compression. The CfE seeks evidence in three main areas: (a) improved compression efficiency and associated trade-offs, (b) encoding under runtime constraints, and (c) enhanced performance in additional functionalities. This initiative aims to evaluate whether new techniques can significantly outperform the current state-of-the-art VVC standard in both compression and practical deployment aspects.

The visual testing will be carried out across seven categories, including various combinations of resolution, dynamic range, and use cases: SDR Random Access UHD/4K, SDR Random Access HD, SDR Low Bitrate HD, HDR Random Access 4K, HDR Random Access Cropped 8K, Gaming Low Bitrate HD, and UGC (User-Generated Content) Random Access HD. Sequences and rate points for testing have already been defined and agreed upon. For a fair comparison, rate-matched anchors using VTM (VVC Test Model) and ECM (Enhanced Compression Model) will be generated, with new configurations to enable reduced run-time evaluations. A dry-run of the visual tests is planned during the upcoming Daejeon meeting, with ECM and VTM as reference anchors, and the CfE welcomes additional submissions. Following this dry-run, the final Call for Evidence is expected to be issued in July, with responses due in October.

The Draft Joint Call for Evidence (CfE) on video compression beyond VVC invites research into next-generation video coding techniques that offer improved compression efficiency, reduced encoding complexity under runtime constraints, and enhanced functionalities such as scalability or perceptual quality. Key research aspects include optimizing the trade-off between bitrate and visual fidelity, developing fast encoding methods suitable for constrained devices, and advancing performance in emerging use cases like HDR, 8K, gaming, and user-generated content.

3D Gaussian Splat Coding

Gaussian splatting is a real-time radiance field rendering method that represents a scene using 3D Gaussians. Each Gaussian has parameters like position, scale, color, opacity, and orientation, and together they approximate how light interacts with surfaces in a scene. Instead of ray marching (as in NeRF), it renders images by splatting the Gaussians onto a 2D image plane and blending them using a rasterization pipeline, which is GPU-friendly and much faster. Developed by Kerbl et al. (2023) it is capable of real-time rendering (60+ fps) and outperforms previous NeRF-based methods in speed and visual quality. Gaussian splat coding refers to the compression and streaming of 3D Gaussian representations for efficient storage and transmission. It's an active research area and under standardization consideration in MPEG.

MPEG technical requirements working group together with MPEG video working group started an exploration on Gaussian splat coding and the MPEG coding of 3D graphics and haptics (3DGH) working group addresses 3D Gaussian splat coding, respectively. Draft Gaussian splat coding use cases and requirements are available and various joint exploration experiments (JEEs) are conducted between meetings.

(3D) Gaussian splat coding is actively researched in academia, also in the context of streaming, e.g., like in “LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming” or “LTS: A DASH Streaming System for Dynamic Multi-Layer 3D Gaussian Splatting Scenes”. The research aspects of 3D Gaussian splat coding and streaming span a wide range of areas across computer graphics, compression, machine learning, and systems for real-time immersive media. In particular, on efficiently representing and transmitting Gaussian-based neural scene representations for real-time rendering. Key areas include compression of Gaussian parameters (position, scale, color, opacity), perceptual and geometry-aware optimizations, and neural compression techniques such as learned latent coding. Streaming challenges involve adaptive, view-dependent delivery, level-of-detail management, and low-latency rendering on edge or mobile devices. Additional research directions include standardizing file formats, integrating with scene graphs, and ensuring interoperability with existing 3D and immersive media frameworks.

MPEG Audio and Video Coding for Machines

The Call for Proposals on Audio Coding for Machines (ACoM), issued by the MPEG audio coding working group, aims to develop a standard for efficiently compressing audio, multi-dimensional signals (e.g., medical data), or extracted features for use in machine-driven applications. The standard targets use cases such as connected vehicles, audio surveillance, diagnostics, health monitoring, and smart cities, where vast data streams must be transmitted, stored, and processed with low latency and high fidelity. The ACoM system is designed in two phases: the first focusing on near-lossless compression of audio and metadata to facilitate training of machine learning models, and the second expanding to lossy compression of features optimized for specific applications. The goal is to support hybrid consumption – by machines and, where needed, humans – while ensuring interoperability, low delay, and efficient use of storage and bandwidth.

The CfP outlines technical requirements, submission guidelines, and evaluation metrics. Participants must provide decoders compatible with Linux/x86 systems, demonstrate performance through objective metrics like compression ratio, encoder/decoder runtime, and memory usage, and undergo a mandatory cross-checking process. Selected proposals will contribute to a reference model and working draft of the standard. Proponents must register by August 1, 2025, with submissions due in September, and evaluation taking place in October. The selection process emphasizes lossless reproduction, metadata fidelity, and significant improvements over a baseline codec, with a path to merge top-performing technologies into a unified solution for standardization.

Research aspects of Audio Coding for Machines (ACoM) include developing efficient compression techniques for audio and multi-dimensional data that preserve key features for machine learning tasks, optimizing encoding for low-latency and resource-constrained environments, and designing hybrid formats suitable for both machine and human consumption. Additional research areas involve creating interoperable feature representations, enhancing metadata handling for context-aware processing, evaluating trade-offs between lossless and lossy compression, and integrating machine-optimized codecs into real-world applications like surveillance, diagnostics, and smart systems.

The MPEG video coding working group approved the committee draft (CD) for ISO/IEC 23888-2 video coding for machines (VCM). VCM aims to encode visual content in a way that maximizes machine task performance, such as computer vision, scene understanding, autonomous driving, smart surveillance, robotics and IoT. Instead of preserving photorealistic quality, VCM seeks to retain features and structures important for machines, possibly at much lower bitrates than traditional video codecs. The CD introduces several new tools and enhancements aimed at improving machine-centric video processing efficiency. These include updates to spatial resampling, such as the signaling of the inner decoded picture size to better support scalable inference. For temporal resampling, the CD enables adaptive resampling ratios and introduces pre- and post-filters within the temporal resampler to maintain task-relevant temporal features. In the filtering domain, it adopts bit depth truncation techniques – integrating bit depth shifting, luma enhancement, and chroma reconstruction – to optimize both signaling efficiency and cross-platform interoperability. Luma enhancement is further refined through an integer-based implementation for luma distribution parameters, while chroma reconstruction is stabilized across different hardware platforms. Additionally, the CD proposes removing the neural network-based in-loop filter (NNLF) to simplify the pipeline. Finally, in terms of bitstream structure, it adopts a flattened structure with new signaling methods to support efficient random access and better coordination with system layers, aligning with the low-latency, high-accuracy needs of machine-driven applications.

Research in VCM focuses on optimizing video representation for downstream machine tasks, exploring task-driven compression techniques that prioritize inference accuracy over perceptual quality. Key areas include joint video and feature coding, adaptive resampling methods tailored to machine perception, learning-based filter design, and bitstream structuring for efficient decoding and random access. Other important directions involve balancing bitrate and task accuracy, enhancing robustness across platforms, and integrating machine-in-the-loop optimization to co-design codecs with AI inference pipelines.

Concluding Remarks

The 150th MPEG meeting marks significant progress across AI-enhanced media, immersive technologies, and machine-oriented coding. With ongoing work on MPEG-AI, metaverse standards, next-gen video compression, Gaussian splat representation, and machine-friendly audio and video coding, MPEG continues to shape the future of interoperable, intelligent, and adaptive multimedia systems. The research opportunities and standardization efforts outlined in this meeting provide a strong foundation for innovations that support real-time, efficient, and cross-platform media experiences for both human and machine consumption.

The 151st MPEG meeting will be held in Daejeon, Korea, from 30 June to 04 July 2025. Click here for more information about MPEG meetings and their developments.

Friday, March 14, 2025

MPEG news: a report from the 149th meeting

This blog post is based on the MPEG press release and has been modified/updated here to focus on and highlight research aspects. This version of the blog post will also be posted at ACM SIGMM Records.

MPEG News Archive

The 149th MPEG meeting took place in Geneva, Switzerland, from January 20 to 24, 2025. The official press release can be found here. MPEG promoted three standards (among others) to Final Draft International Standard (FDIS), driving innovation in next-generation, immersive audio and video coding, and adaptive streaming:

MPEG-I Immersive Audio enables realistic 3D audio with six degrees of freedom (6DoF).
MPEG Immersive Video (Second Edition) introduces advanced coding tools for volumetric video.
MPEG-DASH (Sixth Edition) enhances low-latency streaming, content steering, and interactive media.

This blog post focuses on these new standards/editions based on the press release and amended with research aspect relevant for the ACM SIGMM community.

MPEG-I Immersive Audio

At the 149th MPEG meeting, MPEG Audio Coding (WG 6) promoted ISO/IEC 23090-4 MPEG-I immersive audio to Final Draft International Standard (FDIS), marking a major milestone in the development of next-generation audio technology.

MPEG-I immersive audio is a groundbreaking standard designed for the compact and highly realistic representation of spatial sound. Tailored for Metaverse applications, including Virtual, Augmented, and Mixed Reality (VR/AR/MR), it enables seamless real-time rendering of interactive 3D audio with six degrees of freedom (6DoF). Users can not only turn their heads in any direction (pitch/yaw/roll) but also move freely through virtual environments (x/y/z), creating an unparalleled sense of immersion.

True to MPEG’s legacy, this standard is optimized for efficient distribution – even over networks with severe bitrate constraints. Unlike proprietary VR/AR audio solutions, MPEG-I Immersive Audio ensures broad interoperability, long-term stability, and suitability for both streaming and downloadable content. It also natively integrates MPEG-H 3D Audio for high-quality compression.

The standard models a wide range of real-world acoustic effects to enhance realism. It captures detailed sound source properties (e.g., level, point sources, extended sources, directivity characteristics, and Doppler effects) as well as complex environmental interactions (e.g., reflections, reverberation, diffraction, and both total and partial occlusion). Additionally, it supports diverse acoustic environments, including outdoor spaces, multiroom scenes with connecting portals, and areas with dynamic openings such as doors and windows. Its rendering engine balances computational efficiency with high-quality output, making it suitable for a variety of applications.

Further reinforcing its impact, the upcoming ISO/IEC 23090-34 Immersive audio reference software will fully implement MPEG-I immersive audio in a real-time framework. This interactive 6DoF experience will facilitate industry adoption and accelerate innovation in immersive audio. The reference software is expected to reach FDIS status by April 2025.

With MPEG-I immersive audio, MPEG continues to set the standard for the future of interactive and spatial audio, paving the way for more immersive digital experiences.

Research aspects: Research can focus on optimizing the streaming and compression of MPEG-I immersive audio for constrained networks, ensuring efficient delivery without compromising spatial accuracy. Another key area is improving real-time 6DoF audio rendering by balancing computational efficiency and perceptual realism, particularly in modeling complex acoustic effects like occlusions, reflections, and Doppler shifts for interactive VR/AR/MR applications.

MPEG Immersive Video (Second Edition)

At the 149th MPEG meeting, MPEG Video Coding (WG 4) advanced the second edition of ISO/IEC 23090-12 MPEG immersive video (MIV) to Final Draft International Standard (FDIS), marking a significant step forward in immersive video technology.

MIV enables the efficient compression, storage, and distribution of immersive video content, where multiple real or virtual cameras capture a 3D scene. Designed for next-generation applications, the standard supports playback with six degrees of freedom (6DoF), allowing users to not only change their viewing orientation (pitch/yaw/roll) but also move freely within the scene (x/y/z). By leveraging strong hardware support for widely used video formats, MPEG immersive video provides a highly flexible framework for multi-view video plus depth (MVD) and multi-plane image (MPI) video coding, making volumetric video more accessible and efficient.

With the second edition, MPEG continues to expand the capabilities of MPEG immersive video, introducing a range of new technologies to enhance coding efficiency and support more advanced immersive experiences. Key additions include:

Geometry coding using luma and chroma planes, improving depth representation
Capture device information, enabling better reconstruction of the original scene
Patch margins and background views, optimizing scene composition
Static background atlases, reducing redundant data for stationary elements
Support for decoder-side depth estimation, enhancing depth accuracy
Chroma dynamic range modification, improving color fidelity
Piecewise linear normalized disparity quantization and linear depth quantization, refining depth precision

The second edition also introduces two new profiles: (1) MIV Simple MPI profile, allowing MPI content playback with a single 2D video decoder, and (2) MIV 2 profile, a superset of existing profiles that incorporates all newly added tools.

With these advancements, MPEG immersive video continues to push the boundaries of immersive media, providing a robust and efficient solution for next-generation video applications.

Research aspects: Possible research may explore advancements in MPEG immersive video to improve compression efficiency and real-time streaming while preserving depth accuracy and spatial quality. Another key area is enhancing 6DoF video rendering by leveraging new coding tools like decoder-side depth estimation and geometry coding, enabling more precise scene reconstruction and seamless user interaction in volumetric video applications.

MPEG-DASH (Sixth Edition)

At the 149th MPEG meeting, MPEG Systems (WG 3) advanced the sixth edition of MPEG-DASH (ISO/IEC 23009-1 Media presentation description and segment formats) by promoting it to the Final Draft International Standard (FDIS), the final stage of standards development. This milestone underscores MPEG’s ongoing commitment to innovation and responsiveness to evolving market needs.

The sixth edition introduces several key enhancements to improve the flexibility and efficiency of MPEG-DASH:

Alternative media presentation support, enabling seamless switching between main and alternative streams
Content steering signaling across multiple CDNs, optimizing content delivery
Enhanced segment sequence addressing, improving low-latency streaming and faster tune-in
Compact duration signaling using patterns, reducing MPD overhead
Support for Common Media Client Data (CMCD), enabling better client-side analytics
Nonlinear playback for interactive storylines, expanding support for next-generation media experiences

With these advancements, MPEG-DASH continues to evolve as a robust and scalable solution for adaptive streaming, ensuring greater efficiency, flexibility, and enhanced user experiences across a wide range of applications.

Research aspects: While advancing MPEG-DASH for more efficient and flexible adaptive streaming has been subject to research for a while, optimizing content delivery across multiple CDNs while minimizing latency and optimizing QoE remains an open issue. Another key area is enhancing interactivity and user experiences by leveraging new features like nonlinear playback for interactive storylines and improved client-side analytics through Common Media Client Data (CMCD).

The 150th MPEG meeting will be held online from March 31 to April 04, 2025. Click here for more information about MPEG meetings and their developments.

Friday, December 6, 2024

MPEG news: a report from the 148th meeting

MPEG News Archive

The 148th MPEG meeting took place in Kemer, Türkiye, from November 4 to 8, 2024. The official press release can be found here and includes the following highlights:

Point Cloud Coding: AI-based point cloud coding & enhanced G-PCC
MPEG Systems: New Part of MPEG DASH for redundant encoding and packaging, reference software and conformance of ISOBMFF, and a new structural CMAF brand profile
Video Coding: New part of MPEG-AI and 2nd edition of conformance and reference software for MPEG Immersive Video (MIV)
MPEG completes subjective quality testing for film grain synthesis using the Film Grain Characteristics SEI message

148th MPEG Meeting, Kemer, Türkiye, November 4-8, 2024.

Point Cloud Coding

At the 148^th MPEG meeting, MPEG Coding of 3D Graphics and Haptics (WG 7) launched a new AI-based Point Cloud Coding standardization project. MPEG WG 7 reviewed six responses to a Call for Proposals (CfP) issued in April 2024 targeting the full range of point cloud formats, from dense point clouds used in immersive applications to sparse point clouds generated by Light Detection and Ranging (LiDAR) sensors in autonomous driving. With bit depths ranging from 10 to 18 bits, the CfP called for solutions that could meet the precision requirements of these varied use cases.

Among the six reviewed proposals, the leading proposal distinguished itself with a hybrid coding strategy that integrates end-to-end learning-based geometry coding and traditional attribute coding. This proposal demonstrated exceptional adaptability, capable of efficiently encoding both dense point clouds for immersive experiences and sparse point clouds from LiDAR sensors. With its unified design, the system supports inter-prediction coding using a shared model with intra-coding, applicable across various bitrate requirements without retraining. Furthermore, the proposal offers flexible configurations for both lossy and lossless geometry coding.

Performance assessments highlighted the leading proposal’s effectiveness, with significant bitrate reductions compared to traditional codecs: a 47% reduction for dense, dynamic sequences in immersive applications and a 35% reduction for sparse dynamic sequences in LiDAR data. For combined geometry and attribute coding, it achieved a 40% bitrate reduction across both dense and sparse dynamic sequences, while subjective evaluations confirmed its superior visual quality over baseline codecs.

The leading proposal has been selected as the initial test model, which can be seen as a baseline implementation for future improvements and developments. Additionally, MPEG issued a working draft and common test conditions.

Research aspects: The initial test model, like those for other codec test models, is typically available as open source. This enables both academia and industry to contribute to refining various elements of the upcoming AI-based Point Cloud Coding standard. Of particular interest is how training data and processes are incorporated into the standardization project and their impact on the final standard.

Another point cloud-related project is called Enhanced G-PCC, which introduces several advanced features to improve the compression and transmission of 3D point clouds. Notable enhancements include inter-frame coding, refined octree coding techniques, Trisoup surface coding for smoother geometry representation, and dynamic Optimal Binarization with Update On-the-fly (OBUF) modules. These updates provide higher compression efficiency while managing computational complexity and memory usage, making them particularly advantageous for real-time processing and high visual fidelity applications, such as LiDAR data for autonomous driving and dense point clouds for immersive media.

By adding this new part to MPEG-I, MPEG addresses the industry's growing demand for scalable, versatile 3D compression technology capable of handling both dense and sparse point clouds. Enhanced G-PCC provides a robust framework that meets the diverse needs of both current and emerging applications in 3D graphics and multimedia, solidifying its role as a vital component of modern multimedia systems.

MPEG Systems Updates

At its 148^th meeting, MPEG Systems (WG 3) worked on the following aspects, among others:

New Part of MPEG DASH for redundant encoding and packaging
Reference software and conformance of ISOBMFF
A new structural CMAF brand profile

The second edition of ISO/IEC 14496-32 (ISOBMFF) introduces updated reference software and conformance guidelines, and the new CMAF brand profile supports Multi-View High Efficiency Video Coding (MV-HEVC), which is compatible with devices like Apple Vision Pro and Meta Quest 3.

The new part of MPEG DASH, ISO/IEC 23009-9, addresses redundant encoding and packaging for segmented live media (REAP). The standard is designed for scenarios where redundant encoding and packaging are essential, such as 24/7 live media production and distribution in cloud-based workflows. It specifies formats for interchangeable live media ingest and stream announcements, as well as formats for generating interchangeable media presentation descriptions. Additionally, it provides failover support and mechanisms for reintegrating distributed components in the workflow, whether they involve file-based content, live inputs, or a combination of both.

Research aspects: With the FDIS of MPEG DASH REAP available, the following topics offer potential for both academic and industry-driven research aligned with the standard's objectives (in no particular order or priority):

Optimization of redundant encoding and packaging: Investigate methods to minimize resource usage (e.g., computational power, storage, and bandwidth) in redundant encoding and packaging workflows. Explore trade-offs between redundancy levels and quality of service (QoS) in segmented live media scenarios.
Interoperability of live media Ingest formats: Evaluate the interoperability of the standard's formats with existing live media workflows and tools. Develop techniques for seamless integration with legacy systems and emerging cloud-based media workflows.
Failover mechanisms for cloud-based workflows: Study the reliability and latency of failover mechanisms in distributed live media workflows. Propose enhancements to the reintegration of failed components to maintain uninterrupted service.
Standardized stream announcements and descriptions: Analyze the efficiency and scalability of stream announcement formats in large-scale live streaming scenarios. Research methods for dynamically updating media presentation descriptions during live events.
Hybrid workflow support: Investigate the challenges and opportunities in combining file-based and live input workflows within the standard. Explore strategies for adaptive workflow transitions between live and on-demand content.
Cloud-based workflow scalability: Examine the scalability of the REAP standard in high-demand scenarios, such as global live event streaming. Study the impact of cloud-based distributed workflows on latency and synchronization.
Security and resilience: Research security challenges related to redundant encoding and packaging in cloud environments. Develop techniques to enhance the resilience of workflows against cyberattacks or system failures.
Performance metrics and quality assessment: Define performance metrics for evaluating the effectiveness of REAP in live media workflows. Explore objective and subjective quality assessment methods for media streams delivered using this standard.

The current/updated status of MPEG-DASH is shown in the figure below.

MPEG-DASH status, November 2024.

Video Coding Updates

In terms of video coding, two noteworthy updates are described here:

Part 3 of MPEG-AI, ISO/IEC 23888-3 – Optimization of encoders and receiving systems for machine analysis of coded video content, reached Committee Draft Technical Report (CDTR) status
Second edition of conformance and reference software for MPEG Immersive Video (MIV). This draft includes verified and validated conformance bitstreams and encoding and decoding reference software based on version 22 of the Test model for MPEG immersive video (TMIV). The test model, objective metrics, and some other tools are publicly available at https://gitlab.com/mpeg-i-visual.

Part 3 of MPEG-AI, ISO/IEC 23888-3: This new technical report on "optimization of encoders and receiving systems for machine analysis of coded video content" is based on software experiments conducted by JVET, focusing on optimizing non-normative elements such as preprocessing, encoder settings, and postprocessing. The research explored scenarios where video signals, decoded from bitstreams compliant with the latest video compression standard, ISO/IEC 23090-3 – Versatile Video Coding (VVC), are intended for input into machine vision systems rather than for human viewing. Compared to the JVET VVC reference software encoder, which was originally optimized for human consumption, significant bit rate reductions were achieved when machine vision task precision was used as the performance criterion.

The report will include an annex with example software implementations of these non-normative algorithmic elements, applicable to VVC or other video compression standards. Additionally, it will explore the potential use of existing supplemental enhancement information messages from ISO/IEC 23002-7 – Versatile supplemental enhancement information messages for coded video bitstreams – for embedding metadata useful in these contexts.

Research aspects: (1) Focus on optimizing video encoding for machine vision tasks by refining preprocessing, encoder settings, and postprocessing to improve bit rate efficiency and task precision, compared to traditional approaches for human viewing. (2) Examine the use of metadata, specifically SEI messages from ISO/IEC 23002-7, to enhance machine analysis of compressed video, improving adaptability, performance, and interoperability.

Subjective Quality Testing for Film Grain Synthesis

At the 148^th MPEG meeting , the MPEG Joint Video Experts Team (JVET) with ITU-T SG 16 (WG 5 / JVET) and MPEG Visual Quality Assessment (AG 5) conducted a formal expert viewing experiment to assess the impact of film grain synthesis on the subjective quality of video content. This evaluation specifically focused on film grain synthesis controlled by the Film Grain Characteristics (FGC) supplemental enhancement information (SEI) message. The study aimed to demonstrate the capability of film grain synthesis to mask compression artifacts introduced by the underlying video coding schemes.

For the evaluation, FGC SEI messages were adapted to a diverse set of video sequences, including scans of original film material, digital camera noise, and synthetic film grain artificially applied to digitally captured video. The subjective performance of video reconstructed from VVC and HEVC bitstreams was compared with and without film grain synthesis. The results highlighted the effectiveness of film grain synthesis, showing a significant improvement in subjective quality and enabling bitrate savings of up to a factor of 10 for certain test points.

This study opens several avenues for further research:

Optimization of film grain synthesis techniques: Investigating how different grain synthesis methods affect the perceptual quality of video across a broader range of content and compression levels.
Compression artifact mitigation: Exploring the interaction between film grain synthesis and specific types of compression artifacts, with a focus on improving masking efficiency.
Adaptation of FGC SEI messages: Developing advanced algorithms for tailoring FGC SEI messages to dynamically adapt to diverse video characteristics, including real-time encoding scenarios.
Bitrate savings analysis: Examining the trade-offs between bitrate savings and subjective quality across various coding standards and network conditions.

The 149th MPEG meeting will be held in Geneva, Switzerland from January 20-24, 2025. Click here for more information about MPEG meetings and their developments.

Pages

Friday, November 28, 2025

MPEG Systems received an Emmy® Award for the Common Media Application Format (CMAF)

JVET ratified new editions of VSEI, VVC, and HEVC

The fourth edition of Visual Volumetric Video-based Coding (V3C and V-PCC) has been finalized

Responses to the call for evidence on video compression with capability beyond VVC successfully evaluated

Tuesday, October 14, 2025

Wednesday, July 16, 2025

Wednesday, June 18, 2025

Friday, May 9, 2025

MPEG-AI Strategy

MPEG White Paper on Metaverse Technologies

Draft Joint Call for Evidence (CfE) on Video Compression beyond Versatile Video Coding (VVC)

3D Gaussian Splat Coding

MPEG Audio and Video Coding for Machines

Concluding Remarks

Friday, March 14, 2025

MPEG-I Immersive Audio

MPEG Immersive Video (Second Edition)

MPEG-DASH (Sixth Edition)

Friday, December 6, 2024

Point Cloud Coding

MPEG Systems Updates

Video Coding Updates

Subjective Quality Testing for Film Grain Synthesis