Showing posts with label mpeg-i. Show all posts
Showing posts with label mpeg-i. Show all posts

Friday, July 5, 2019

Workshop on Coding Technologies for Immersive Audio/Visual Experiences

This is an invitation to attend the Workshop on Coding Technologies for Immersive Audio/Visual Experiences. It is open not limited to MPEG members but publicly accessible with free-of-charge preregistration. Non-MPEG member may send your registration information to Lu Yu  and Silke Kenzler  with your name, affiliation and email address.

This workshop will cover the MPEG-I immersive Audio and Visual activities - past, present and future, demonstrate systems and technologies and discuss future requirements on standardization to provide audio/visual immersive experiences.

Time/Date: 13:00-18:00, 10 July, 2019

Address: 
Room: Drottingporten 3
Clarion Post Hotel
Drottningtorget 10
411 03 Gothenburg, Sweden

Programm: 




1300-1315
Introduction 
(Lu Yu, Zhejiang University)
1315-1345
Usecases and challenges about user immersive experiences
(Valerie Allie, InterDigital)
1345-1415
Overview of technologies for immersive visual experiences: capture, processing, compression, standardization and display
(Marek Domanski, Poznan University of Technology)
1415-1445
MPEG-I Immersive Audio
(Schuyler Quackenbush, Audio Research Labs)
1445-1455
Brief introduction about demos: 
    Integral photography display (NHK)
    Realtime interactive demo with 3DoF+ content (InterDigital)
    Plenoptic 2.0 video camera (Tsinghua University)
    A simple free-viewpoint television system (Poznan University of Technology)
1455-1530
Demos 
Coffee break
1530-1600
360° and 3DoF+ video
(Bart Kroon, Philips)
1600-1630
Point cloud compression
(Marius Preda, Telecom SudParis, CNRS Samovar)
1630-1700
How can we achieve 6DoF video compression?
(Joel Jung, Orange)
1700-1730
How can we achieve lenslet video compression? 
(Xin Jin, Tsinghua University, 
Mehrdad Teratani, Nagoya University)
1730-1800
Discussion

Monday, February 18, 2019

MPEG news: a report from the 125th meeting, Marrakesh, Morocco

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:
  • Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
  • 3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
  • MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
  • ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
  • MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description
The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.

Wednesday, August 15, 2018

MPEG news: a report from the 123rd meeting, Ljubljana, Slovenia

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
The MPEG press release comprises the following topics:

  • MPEG issues Call for Evidence on Compressed Representation of Neural Networks
  • Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
  • MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
  • MPEG releases software for MPEG-I visual activities
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.
At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: As this is about "compression" of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).
NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.
After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.
Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.
Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.
Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard's participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.

What else happened at #MPEG123?

  • The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
  • There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.
  • MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
  • Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
  • JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: http://phenix.it-sudparis.eu/jvet/. An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
  • In addition to VVC -- which is a joint activity with VCEG --, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
  • Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.
In this context it is probably worth mentioning the following DASH awards at recent conferences
Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Thursday, May 10, 2018

MPEG news: a report from the 122nd meeting, San Diego, CA, USA

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The MPEG press release comprises the following topics:
  • Versatile Video Coding (VVC) project starts strongly in the Joint Video Experts Team
  • MPEG issues Call for Proposals on Network-based Media Processing
  • MPEG finalizes 7th edition of MPEG-2 Systems Standard
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with two new features
  • MPEG-G standards reach Draft International Standard for transport and compression technologies

Versatile Video Coding (VVC) – MPEG’ & VCEG’s new video coding project starts strong

The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include — but not limited to — 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Interestingly, some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.

Research aspects: Compression tools and everything around it including its objective and subjective assessment. The main application area is clearly 360-degree and HDR. Watch out conferences like PCS and ICIP (later this year), which will be full of papers making references to VVC. Interestingly, VVC comes with a first draft, a test model for simulation experiments, and a technology benchmark set which is useful and important for any developments for both inside and outside MPEG as it allows for reproducibility.

MPEG issues Call for Proposals on Network-based Media Processing

This Call for Proposals (CfP) addresses advanced media processing technologies such as network stitching for VR service, super resolution for enhanced visual quality, transcoding, and viewport extraction for 360-degree video within the network environment that allows service providers and end users to describe media processing operations that are to be performed by the network. Therefore, the aim of network-based media processing (NBMP) is to allow end user devices to offload certain kinds of processing to the network. Therefore, NBMP describes the composition of network-based media processing services based on a set of media processing functions and makes them accessible through Application Programming Interfaces (APIs). Responses to the NBMP CfP will be evaluated on the weekend prior to the 123rd MPEG meeting in July 2018.

Research aspects: This project reminds me a lot about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG targets APIs rather than pure metadata formats, which is a step forward into the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

7th edition of MPEG-2 Systems Standard and ISO Base Media File Format (ISOBMFF) with two new features

More than 20 years since its inception development of MPEG-2 systems technology (i.e., transport/program stream) continues. New features include support for: (i) JPEG 2000 video with 4K resolution and ultra-low latency, (ii) media orchestration related metadata, (iii) sample variance, and (iv) HEVC tiles.

The partial file format enables the description of an ISOBMFF file partially received over lossy communication channels. This format provides tools to describe reception data, the received data and document transmission information such as received or lost byte ranges and whether the corrupted/lost bytes are present in the file and repair information such as location of the source file, possible byte offsets in that source, byte stream position at which a parser can try processing a corrupted file. Depending on the communication channel, this information may be setup by the receiver or through out-of-band means.

ISOBMFF's sample variants (2nd edition), which are typically used to provide forensic information in the rendered sample data that can, for example, identify the specific Digital Rights Management (DRM) client which has decrypted the content. This variant framework is intended to be fully compatible with MPEG’s Common Encryption (CENC) and agnostic to the particular forensic marking system used.

Research aspects: MPEG systems standards are mainly relevant for multimedia systems research with all its characteristics. The partial file format is specifically interesting as it targets scenarios with lossy communication channels.

MPEG-G standards reach Draft International Standard for transport and compression technologies

MPEG-G provides a set of standards enabling interoperability for applications and services dealing with high-throughput deoxyribonucleic acid (DNA) sequencing. At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Research aspects: the main focus of this work item is compression and transport is still in its infancy. Therefore, research on the actual delivery for compressed DNA information as well as its processing is solicited.

What else happened at MPEG122?

  • Requirements is exploring new video coding tools dealing with low-complexity and process enhancements.
  • The activity around coded representation of neural networks has defined a set of vital use cases and is now calling for test data to be solicited until the next meeting.
  • The MP4 registration authority (MP4RA) has a new awesome web site http://mp4ra.org/.
  • MPEG-DASH is finally approving and working the 3rd edition comprising consolidated version of recent amendments and corrigenda.
  • CMAF started an exploration on multi-stream support, which could be relevant for tiled streaming and multi-channel audio.
  • OMAF kicked-off its activity towards a 2nd edition enabling support for 3DoF+ and social VR with the plan going to committee draft (CD) in Oct’18. Additionally, there’s a test framework proposed, which allows to assess performance of various OMAF tools. Its main focus is on video but MPEG’s audio subgroup has a similar framework to enable subjective testing. It could be interesting seeing these two frameworks combined in one way or the other.
  • MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively.
  • Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.
Research aspects: In this section I would like to focus on DASH, CMAF, and OMAF. Multi-stream support, as mentioned above, is relevant for tiled streaming and multi-channel audio which has been recently studied in the literature and is also highly relevant for industry. The efficient storage and streaming of such kind of content within the file format is an important aspect and often underrepresented in both research and standardization. The goal here is to keep the overhead low while maximizing the utility of the format to enable certain functionalities. OMAF now targets the social VR use case, which has been discussed in the research literature for a while and, finally, makes its way into standardization. An important aspect here is both user and quality of experience, which requires intensive subjective testing.

Finally, on May 10 MPEG will celebrate 30 years as its first meeting dates back to 1988 in Ottawa, Canada with around 30 attendees. The 122nd meeting had more than 500 attendees and MPEG has around 20 active work items. A total of more than 170 standards have been produces (that’s approx. six standards per year) where some standards have up to nine editions like the HEVC standards. Overall, MPEG is responsible for more that 23% of all JTC 1 standards and some of them showing extraordinary longevity regarding extensions, e.g., MPEG-2 systems (24 years), MPEG-4 file format (19 years), and AVC (15 years). MPEG standards serve billions of users (e.g., MPEG-1 video, MP2, MP3, AAC, MPEG-2, AVC, ISOBMFF, DASH). Some — more precisely five — standards have receive Emmy awards in the past (MPEG-1, MPEG-2, AVC (2x), and HEVC).
Tag cloud generated from all existing MPEG press releases.
Thus, happy birthday MPEG! In today’s society starts the high performance era with 30 years, basically the time of “compression”, i.e., we apply all what we learnt and live out everything, truly optimistic perspective for our generation X (millennials) standards body!

Sunday, March 18, 2018

HVEI'18: A Framework for Adaptive Delivery of Omnidirectional Video

A Framework for Adaptive Delivery of Omnidirectional Video

Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin) and Ali C. Begen (Ozyegin University / Networked Media)

Abstract: Omnidirectional or 360-degree videos are considered as a next step towards a truly immersive media experience. Such videos allow the user to change her/his viewing direction while consuming the video. The download-and-play paradigm (including DVD and Blu-ray) is replaced by streaming, and the content is hosted solely within the cloud. This paper addresses the need for a scientific framework enabling the adaptive delivery of omnidirectional video within heterogeneous environments. We consider the state-of-the-art techniques for adaptive streaming over HTTP and extend them towards omnidirectional/360-degree videos. In particular, we review the encoding and adaptive streaming options, and present preliminary results reported in the literature. Finally, we provide an overview about the ongoing standardization efforts and highlight the major open issues.

Slides:

Tuesday, May 30, 2017

MPEG news: a report from the 118th meeting, Hobart, Australia

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

MPEG News Archive
The entire MPEG press release can be found here comprising the following topics:
  • Coded Representation of Immersive Media (MPEG-I): new work item approved and call for test data issued
  • Common Media Application Format (CMAF): FDIS approved
  • Beyond High Efficiency Video Coding (HEVC): call for evidence for "beyond HEVC" and verification tests for screen content coding extensions of HEVC

Coded Representation of Immersive Media (MPEG-I)

MPEG started to work on the new work item referred to as ISO/IEC 23090 with the "nickname" MPEG-I targeting future immersive applications. The goal of this new standard is to enable various forms of audio-visual immersion including panoramic video with 2D and 3D audio with various degrees of true 3D visual perception. It currently comprises five parts: (pt. 1) a technical report describing the scope of this new standard and a set of use cases and applications; (pt. 2) an application format for omnidirectional media (aka OMAF) to address the urgent need of the industry for a standard is this area; (pt. 3) immersive video which is a kind of placeholder for the successor of HEVC (if at all); (pt. 4) immersive audio as a placeholder for the successor of 3D audio (if at all); and (pt. 5) for point cloud compression. The point cloud compression standard targets lossy compression for point clouds in real-time communication, six Degrees of Freedom (6 DoF) virtual reality, and the dynamic mapping for autonomous driving, cultural heritage applications, etc. Part 2 is related to OMAF which I've discussed in my previous blog post.

MPEG also established an Ad-hoc Group (AhG) on immersive Media quality evaluation with the following mandates: 1. Produce a document on VR QoE requirements; 2. Collect test material with immersive video and audio signals; 3. Study existing methods to assess human perception and reaction to VR stimuli; 4. Develop test methodology for immersive media, including simultaneous video and audio; 5. Study VR experience metrics and their measurability in VR services and devices. AhGs are open to everybody and mostly discussed using mailing lists (join here https://lists.aau.at/mailman/listinfo/immersive-quality). Interestingly, a Joint Qualinet-VQEG team on Immersive Media (JQVIM) has been recently established with similar goals and also the VR Industry Forum (VRIF) has issued a call for VR360 content. It seems there's a strong need for a dataset similar to the one we have created for MPEG-DASH long time ago.

The JQVIM has been created as part of the QUALINET task force on "Immersive Media Experiences (IMEx)" which aims at providing end users the sensation of being part of the particular media which shall result in a worthwhile, informative user and quality of experience. The main goals are providing datasets and tools (hardware/software), subjective quality evaluations, field studies, cross- validation including a strong theoretical foundation relevant along the empirical databases and tools which hopefully results in a framework, methodology, and best practices for immersive media experiences.

Common Media Application Format (CMAF)

The Final Draft International Standard (FDIS) has been issued at the 118th MPEG meeting which concludes the formal technical development process of the standard. At this point in time national bodies can only vote Yes|No and editorial changes are allowed (if any) before the International Standard (IS) becomes available. The goal of CMAF is to define a single format for the transport and storage of segmented media including audio/video formats, subtitles, and encryption -- it is derived from the ISO Base Media File Format (ISOBMFF). As it's a combination of various MPEG standard it's referred to as an Application Format (AS) which mainly takes existing formats/standards and glues them together for a specific target application. The CMAF standard clearly targets dynamic adaptive streaming (over -- but not limited to -- HTTP) but focusing on the media format only and excluding the manifest format. Thus, the CMAF standard shall be compatible with other formats such as MPEG-DASH and HLS. In fact, HLS has been extended already some time ago to support 'fragmented MP4' which we have demonstrated also and it has been interpreted as a first step towards the harmonization of MPEG-DASH and HLS; at least on the segment format. The delivery of CMAF contents with DASH will be described in part 7 of MPEG-DASH that basically comprises a mapping of CMAF concepts to DASH terms.

From a research perspective, it would be interesting to explore how certain CMAF concepts are able to address current industry needs, specifically in the context of low-latency streaming which has been demonstrated recently.

Beyond HEVC...

The preliminary call for evidence (CfE) on video compression with capability beyond HEVC has been issued and is addressed to interested parties that have technology providing better compression capability than the existing standard, either for conventional video material, or for other domains such as HDR/WCG or 360-degree ("VR") video. Test cases are defined for SDR, HDR, and 360-degree content. This call has been made jointly by ISO/IEC MPEG and ITU-T SG16/Q6 (VCEG). The evaluation of the responses is scheduled for July 2017 and depending on the outcome of the CfE, the parent bodies of the Joint Video Exploration Team (JVET) of MPEG and VCEG collaboration intend to issue a Draft Call for Proposals by the end of the July meeting.

Finally, verification tests have been conducted for the Screen Content Coding (SCC) extensions to HEVC showing exceptional performance. Screen content is video containing a significant proportion of rendered (moving or static) graphics, text, or animation rather than, or in addition to, camera-captured video scenes. For scenes containing a substantial amount of text and graphics, the tests showed a major benefit in compression capability for the new extensions over both the Advanced Video Coding standard and the previous version of the newer HEVC standard without the new SCC features.

The question whether and how new codecs like (beyond) HEVC competes with AV1 is subject to research and development. It has been discussed also in the scientific literature but lacks of vendor neutral comparison which is difficult to achieve and not to compare apples with oranges (due to the high number of different coding tools and parameters). An important aspect which always needs to be considered is one typically compares specific implementations of a coding format and not the standard as the encoding is usually not defined, only the bitstream syntax that implicitly defines the decoder.

Publicly available documents from the 118th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Torino, Italy, July 17-21, 2017. Feel free to contact us for any questions or comments.

Friday, February 10, 2017

MPEG news: a report from the 117th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 117th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Committee Draft of the Omnidirectional Media Application Format (OMAF)
  • MPEG-H 3D Audio Verification Test Report
  • MPEG Workshop on 5-Year Roadmap Successfully Held in Geneva
  • Call for Proposals (CfP) for Point Cloud Compression (PCC)
  • Preliminary Call for Evidence on video compression with capability beyond HEVC
  • MPEG issues Committee Draft of the Media Orchestration (MORE) Standard
  • Technical Report on HDR/WCG Video Coding
In this blog post, I'd like to focus on the topics related to multimedia communication. Thus, let's start with OMAF.

Omnidirectional Media Application Format (OMAF)

Real-time entertainment services deployed over the open, unmanaged Internet – streaming audio and video – account now for more than 70% of the evening traffic in North American fixed access networks and it is assumed that this figure will reach 80 percent by 2020. More and more such bandwidth hungry applications and services are pushing onto the market including immersive media services such as virtual reality and, specifically 360-degree videos. However, the lack of appropriate standards and, consequently, reduced interoperability is becoming an issue. Thus, MPEG has started a project referred to as Omnidirectional Media Application Format (OMAF). The first milestone of this standard has been reached and the committee draft (CD) has been approved at the 117th MPEG meeting. Such application formats "are essentially superformats that combine selected technology components from MPEG (and other) standards to provide greater application interoperability, which helps satisfy users' growing need for better-integrated multimedia solutions" [MPEG-A]." In the context of OMAF, the following aspects are defined:
  • Equirectangular projection format (note: others might be added in the future)
  • Metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data
  • Storage format: ISO base media file format (ISOBMFF)
  • Codecs: High Efficiency Video Coding (HEVC) and MPEG-H 3D audio
OMAF is the first specification which is defined as part of a bigger project currently referred to as ISO/IEC 23090 -- Immersive Media (Coded Representation of Immersive Media). It currently has the acronym MPEG-I and we have previously used MPEG-VR which is now replaced by MPEG-I (that still might chance in the future). It is expected that the standard will become Final Draft International Standard (FDIS) by Q4 of 2017. Interestingly, it does not include AVC and AAC, probably the most obvious candidates for video and audio codecs which have been massively deployed in the last decade and probably still will be a major dominator (and also denominator) in upcoming years. On the other hand, the equirectangular projection format is currently the only one defined as it is broadly used already in off-the-shelf hardware/software solutions for the creation of omnidirectional/360-degree videos. Finally, the metadata formats enabling the rendering of 360-degree monoscopic and stereoscopic video is highly appreciated. A solution for MPEG-DASH based on AVC/AAC utilizing equirectangular projection format for both monoscopic and stereoscopic video is shown as part of Bitmovin's solution for VR and 360-degree video.

Research aspects related to OMAF can be summarized as follows:
  • HEVC supports tiles which allow for efficient streaming of omnidirectional video but HEVC is not as widely deployed as AVC. Thus, it would be interesting how to mimic such a tile-based streaming approach utilizing AVC.
  • The question how to efficiently encode and package HEVC tile-based video is an open issue and call for a tradeoff between tile flexibility and coding efficiency.
  • When combined with MPEG-DASH (or similar), there's a need to update the adaptation logic as the with tiles yet another dimension is added that needs to be considered in order to provide a good Quality of Experience (QoE).
  • QoE is a big issue here and not well covered in the literature. Various aspects are worth to be investigated including a comprehensive dataset to enable reproducibility of research results in this domain. Finally, as omnidirectional video allows for interactivity, also the user experience is becoming an issue which needs to be covered within the research community.
A second topic I'd like to highlight in this blog post is related to the preliminary call for evidence on video compression with capability beyond HEVC.

Preliminary Call for Evidence on video compression with capability beyond HEVC

A call for evidence is issued to see whether sufficient technological potential exists to start a more rigid phase of standardization. Currently, MPEG together with VCEG have developed a Joint Exploration Model (JEM) algorithm that is already known to provide bit rate reductions in the range of 20-30% for relevant test cases, as well as subjective quality benefits. The goal of this new standard -- with a preliminary target date for completion around late 2020 -- is to develop technology providing better compression capability than the existing standard, not only for conventional video material but also for other domains such as HDR/WCG or VR/360-degrees video. An important aspect in this area is certainly over-the-top video delivery (like with MPEG-DASH) which includes features such as scalability and Quality of Experience (QoE). Scalable video coding has been added to video coding standards since MPEG-2 but never reached wide-spread adoption. That might change in case it becomes a prime-time feature of a new video codec as scalable video coding clearly shows benefits when doing dynamic adaptive streaming over HTTP. QoE did find its way already into video coding, at least when it comes to evaluating the results where subjective tests are now an integral part of every new video codec developed by MPEG (in addition to usual PSNR measurements). Therefore, the most interesting research topics from a multimedia communication point of view would be to optimize the DASH-like delivery of such new codecs with respect to scalability and QoE. Note that if you don't like scalable video coding, feel free to propose something else as long as it reduces storage and networking costs significantly.

MPEG Workshop “Global Media Technology Standards for an Immersive Age”

On January 18, 2017 MPEG successfully held a public workshop on “Global Media Technology Standards for an Immersive Age” hosting a series of keynotes from Bitmovin, DVB, Orange, Sky Italia, and Technicolor. Stefan Lederer, CEO of Bitmovin discussed today's and future challenges with new forms of content like 360°, AR and VR. All slides are available here and MPEG took their feedback into consideration in an update of its 5-year standardization roadmap. David Wood (EBU) reported on the DVB VR study mission and Ralf Schaefer (Technicolor) presented a snapshot on VR services. Gilles Teniou (Orange) discussed video formats for VR pointing out a new opportunity to increase the content value but also raising a question what is missing today. Finally, Massimo Bertolotti (Sky Italia) introduced his view on the immersive media experience age.

Overall, the workshop was well attended and as mentioned above, MPEG is currently working on a new standards project related to immersive media. Currently, this project comprises five parts. The first part comprises a technical report describing the scope (incl. kind of system architecture), use cases, and applications. The second part is OMAF (see above) and the third/forth parts are related to immersive video and audio respectively. Part five is about point cloud compression.

For those interested, please check out the slides from industry representatives in this field and draw your own conclusions what could be interesting for your own research. I'm happy to see any reactions, hints, etc. in the comments..

Finally, let's have a look what happened related to MPEG-DASH, a topic with a long history on this blog.

MPEG-DASH and CMAF: Friend or Foe?

For MPEG-DASH and CMAF it was a meeting "in between" official standardization stages. MPEG-DASH experts are still working on the third edition which will be a consolidated version of the 2nd edition and various amendments and corrigenda. In the meantime, MPEG issues a white paper on the new features of MPEG-DASH which I would like to highlight here.
  • Spatial Relationship Description (SRD): allows to describe tiles and region of interests for partial delivery of media presentations. This is highly related to OMAF and VR/360-degree video streaming.
  • External MPD linking: this feature allows to describe the relationship between a single program/channel and a preview mosaic channel having all channels at once within the MPD.
  • Period continuity: simple signaling mechanism to indicate whether one period is a continuation of the previous one which is relevant for ad-insertion or live programs.
  • MPD chaining: allows for chaining two or more MPDs to each other, e.g., pre-roll ad when joining a live program.
  • Flexible segment format for broadcast TV: separates the signaling of the switching points and random access points in each stream and, thus, the content can be encoded with a good compression efficiency, yet allowing higher number of random access point, but with lower frequency of switching points.
  • Server and network-assisted DASH (SAND): enables asynchronous network-to-client and network-to-network communication of quality-related assisting information.
  • DASH with server push and WebSockets: basically addresses issues related to HTTP/2 push feature and WebSocket.
CMAF issued a study document which captures the current progress and all national bodies are encouraged to take this into account when commenting on the Committee Draft (CD). To answer the question in the headline above, it looks more and more like as DASH and CMAF will become friends -- let's hope that the friendship lasts for a long time.

What else happened at the MPEG meeting?

  • Committee Draft MORE (note: type in 'man more' on any unix/linux/max terminal and you'll get 'less - opposite of more';): MORE stands for “Media Orchestration” and provides a specification that enables the automated combination of multiple media sources (cameras, microphones) into a coherent multimedia experience. Additionally, it targets use cases where a multimedia experience is rendered on multiple devices simultaneously, again giving a consistent and coherent experience.
  • Technical Report on HDR/WCG Video Coding: This technical report comprises conversion and coding practices for High Dynamic Range (HDR) and Wide Colour Gamut (WCG) video coding (ISO/IEC 23008-14). The purpose of this document is to provide a set of publicly referenceable recommended guidelines for the operation of AVC or HEVC systems adapted for compressing HDR/WCG video for consumer distribution applications
  • CfP Point Cloud Compression (PCC): This call solicits technologies for the coding of 3D point clouds with associated attributes such as color and material properties. It will be part of the immersive media project introduced above.
  • MPEG-H 3D Audio verification test report: This report presents results of four subjective listening tests that assessed the performance of the Low Complexity Profile of MPEG-H 3D Audio. The tests covered a range of bit rates and a range of “immersive audio” use cases (i.e., from 22.2 down to 2.0 channel presentations). Seven test sites participated in the tests with a total of 288 listeners.
The next MPEG meeting will be held in Hobart, April 3-7, 2017. Feel free to contact us for any questions or comments.