Showing posts with label OMAF. Show all posts
Showing posts with label OMAF. Show all posts

Wednesday, November 27, 2019

MPEG news: a report from the 128th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The 128th MPEG meeting concluded on October 11, 2019 in Geneva, Switzerland with the following topics:
  • Low Complexity Enhancement Video Coding (LCEVC) Promoted to Committee Draft
  • 2nd Edition of Omnidirectional Media Format (OMAF) has reached the first milestone
  • Genomic Information Representation – Part 4 Reference Software and Part 5 Conformance Promoted to Draft International Standard
The corresponding press release of the 128th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/128.
In this report we will focus on video coding aspects (i.e., LCEVC) and immersive media applications (i.e., OMAF). At the end, we will provide an update related to adaptive streaming (i.e., DASH and CMAF).

Low Complexity Enhancement Video Coding

Low Complexity Enhancement Video Coding (LCEVC) has been promoted to committee draft (CD) which is the first milestone in the ISO/IEC standardization process. LCEVC is part two of MPEG-5 or ISO/IEC 23094-2 if you prefer the always easy-to-remember ISO codes. We introduced MPEG-5 already in previous posts and LCEVC is about a standardized video coding solution that leverages other video codecs in a manner that improves video compression efficiency while maintaining or lowering the overall encoding and decoding complexity.
The LCEVC standard uses a lightweight video codec to add up to two layers of encoded residuals. The aim of these layers is correcting artefacts produced by the base video codec and adding detail and sharpness for the final output video.
The target of this standard comprises software or hardware codecs with extra processing capabilities, e.g., mobile devices, set top boxes (STBs), and personal computer based decoders. Additional benefits are the reduction in implementation complexity or a corresponding expansion in spatial resolution.
LCEVC is based on existing codecs which allows for backwards-compatibility with existing deployments. Supporting LCEVC enables “softwareized” video coding allowing for release and deployment options known from software-based solutions which are well understood by software companies and, thus, opens new opportunities in improving and optimizing video-based services and applications.
Research aspects: in video coding, research efforts are mainly related to coding efficiency and complexity (as usual). However, as MPEG-5 basically adds a software layer on top of what is typically implemented in hardware, all kind of aspects related to software engineering could become an active area of research.

Omnidirectional Media Format

The scope of the Omnidirectional Media Format (OMAF) is about 360° video, images, audio and associated timed text and specifies (i) a coordinate system, (ii) projection and rectangular region-wise packing methods, (iii) storage of omnidirectional media and the associated metadata using ISOBMFF, (iv) encapsulation, signaling and streaming of omnidirectional media in DASH and MMT, and (v) media profiles and presentation profiles.
At this meeting, the second edition of OMAF (ISO/IEC 23090-2) has been promoted to committee draft (CD) which includes
  • support of improved overlay of graphics or textual data on top of video,
  • efficient signaling of videos structured in multiple sub parts,
  • enabling more than one viewpoint, and
  • new profiles supporting dynamic bitstream generation according to the viewport.
As for the first edition, OMAF includes encapsulation and signaling in ISOBMFF as well as streaming of omnidirectional media (DASH and MMT). It will reach its final milestone by the end of 2020.
360° video is certainly a vital use case towards a fully immersive media experience. Devices to capture and consume such content are becoming increasingly available and will probably contribute to the dissemination of this type of content. However, it is also understood that the complexity increases significantly, specifically with respect to large-scale, scalable deployments due to increased content volume/complexity, timing constraints (latency), and quality of experience issues.
Research aspects: understanding the increased complexity of 360° video or immersive media in general is certainly an important aspect to be addressed towards enabling applications and services in this domain. We may even start thinking that 360° video actually works (e.g., it's possible to capture, upload to YouTube and consume it on many devices) but the devil is in the detail in order to handle this complexity in an efficient way to enable seamless and high quality of experience.

DASH and CMAF

The 4th edition of DASH (ISO/IEC 23009-1) will be published soon and MPEG is currently working towards a first amendment which will be about (i) CMAF support and (ii) event processing model. An overview of all DASH standards is depicted in the figure below, notably part one of MPEG-DASH referred to as media presentation description and segment formats.
The 2nd edition of the CMAF standard (ISO/IEC 23000-19) will become available very soon and MPEG is currently reviewing additional tools in the so-called technologies under considerations document as well as conducting various explorations. A working draft for additional media profiles is also under preparation.
Research aspects: with CMAF, low-latency supported is added to DASH-like applications and services. However, the implementation specifics are actually not defined in the standard and subject to competition (e.g., here). Interestingly, the Bitmovin video developer reports from both 2018 and 2019 highlight the need for low-latency solutions in this domain.
At the ACM Multimedia Conference 2019 in Nice, France I gave a tutorial entitled “A Journey towards Fully Immersive Media Access” which includes updates related to DASH and CMAF. The slides are available here.

Outlook 2020

Finally, let me try giving an outlook for 2020, not so much content-wise but events planned for 2020 that are highly relevant for this column:
... and many more!

Thursday, May 10, 2018

MPEG news: a report from the 122nd meeting, San Diego, CA, USA

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The MPEG press release comprises the following topics:
  • Versatile Video Coding (VVC) project starts strongly in the Joint Video Experts Team
  • MPEG issues Call for Proposals on Network-based Media Processing
  • MPEG finalizes 7th edition of MPEG-2 Systems Standard
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with two new features
  • MPEG-G standards reach Draft International Standard for transport and compression technologies

Versatile Video Coding (VVC) – MPEG’ & VCEG’s new video coding project starts strong

The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include — but not limited to — 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Interestingly, some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.

Research aspects: Compression tools and everything around it including its objective and subjective assessment. The main application area is clearly 360-degree and HDR. Watch out conferences like PCS and ICIP (later this year), which will be full of papers making references to VVC. Interestingly, VVC comes with a first draft, a test model for simulation experiments, and a technology benchmark set which is useful and important for any developments for both inside and outside MPEG as it allows for reproducibility.

MPEG issues Call for Proposals on Network-based Media Processing

This Call for Proposals (CfP) addresses advanced media processing technologies such as network stitching for VR service, super resolution for enhanced visual quality, transcoding, and viewport extraction for 360-degree video within the network environment that allows service providers and end users to describe media processing operations that are to be performed by the network. Therefore, the aim of network-based media processing (NBMP) is to allow end user devices to offload certain kinds of processing to the network. Therefore, NBMP describes the composition of network-based media processing services based on a set of media processing functions and makes them accessible through Application Programming Interfaces (APIs). Responses to the NBMP CfP will be evaluated on the weekend prior to the 123rd MPEG meeting in July 2018.

Research aspects: This project reminds me a lot about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG targets APIs rather than pure metadata formats, which is a step forward into the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

7th edition of MPEG-2 Systems Standard and ISO Base Media File Format (ISOBMFF) with two new features

More than 20 years since its inception development of MPEG-2 systems technology (i.e., transport/program stream) continues. New features include support for: (i) JPEG 2000 video with 4K resolution and ultra-low latency, (ii) media orchestration related metadata, (iii) sample variance, and (iv) HEVC tiles.

The partial file format enables the description of an ISOBMFF file partially received over lossy communication channels. This format provides tools to describe reception data, the received data and document transmission information such as received or lost byte ranges and whether the corrupted/lost bytes are present in the file and repair information such as location of the source file, possible byte offsets in that source, byte stream position at which a parser can try processing a corrupted file. Depending on the communication channel, this information may be setup by the receiver or through out-of-band means.

ISOBMFF's sample variants (2nd edition), which are typically used to provide forensic information in the rendered sample data that can, for example, identify the specific Digital Rights Management (DRM) client which has decrypted the content. This variant framework is intended to be fully compatible with MPEG’s Common Encryption (CENC) and agnostic to the particular forensic marking system used.

Research aspects: MPEG systems standards are mainly relevant for multimedia systems research with all its characteristics. The partial file format is specifically interesting as it targets scenarios with lossy communication channels.

MPEG-G standards reach Draft International Standard for transport and compression technologies

MPEG-G provides a set of standards enabling interoperability for applications and services dealing with high-throughput deoxyribonucleic acid (DNA) sequencing. At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Research aspects: the main focus of this work item is compression and transport is still in its infancy. Therefore, research on the actual delivery for compressed DNA information as well as its processing is solicited.

What else happened at MPEG122?

  • Requirements is exploring new video coding tools dealing with low-complexity and process enhancements.
  • The activity around coded representation of neural networks has defined a set of vital use cases and is now calling for test data to be solicited until the next meeting.
  • The MP4 registration authority (MP4RA) has a new awesome web site http://mp4ra.org/.
  • MPEG-DASH is finally approving and working the 3rd edition comprising consolidated version of recent amendments and corrigenda.
  • CMAF started an exploration on multi-stream support, which could be relevant for tiled streaming and multi-channel audio.
  • OMAF kicked-off its activity towards a 2nd edition enabling support for 3DoF+ and social VR with the plan going to committee draft (CD) in Oct’18. Additionally, there’s a test framework proposed, which allows to assess performance of various OMAF tools. Its main focus is on video but MPEG’s audio subgroup has a similar framework to enable subjective testing. It could be interesting seeing these two frameworks combined in one way or the other.
  • MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively.
  • Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.
Research aspects: In this section I would like to focus on DASH, CMAF, and OMAF. Multi-stream support, as mentioned above, is relevant for tiled streaming and multi-channel audio which has been recently studied in the literature and is also highly relevant for industry. The efficient storage and streaming of such kind of content within the file format is an important aspect and often underrepresented in both research and standardization. The goal here is to keep the overhead low while maximizing the utility of the format to enable certain functionalities. OMAF now targets the social VR use case, which has been discussed in the research literature for a while and, finally, makes its way into standardization. An important aspect here is both user and quality of experience, which requires intensive subjective testing.

Finally, on May 10 MPEG will celebrate 30 years as its first meeting dates back to 1988 in Ottawa, Canada with around 30 attendees. The 122nd meeting had more than 500 attendees and MPEG has around 20 active work items. A total of more than 170 standards have been produces (that’s approx. six standards per year) where some standards have up to nine editions like the HEVC standards. Overall, MPEG is responsible for more that 23% of all JTC 1 standards and some of them showing extraordinary longevity regarding extensions, e.g., MPEG-2 systems (24 years), MPEG-4 file format (19 years), and AVC (15 years). MPEG standards serve billions of users (e.g., MPEG-1 video, MP2, MP3, AAC, MPEG-2, AVC, ISOBMFF, DASH). Some — more precisely five — standards have receive Emmy awards in the past (MPEG-1, MPEG-2, AVC (2x), and HEVC).
Tag cloud generated from all existing MPEG press releases.
Thus, happy birthday MPEG! In today’s society starts the high performance era with 30 years, basically the time of “compression”, i.e., we apply all what we learnt and live out everything, truly optimistic perspective for our generation X (millennials) standards body!

Sunday, March 18, 2018

HVEI'18: A Framework for Adaptive Delivery of Omnidirectional Video

A Framework for Adaptive Delivery of Omnidirectional Video

Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin) and Ali C. Begen (Ozyegin University / Networked Media)

Abstract: Omnidirectional or 360-degree videos are considered as a next step towards a truly immersive media experience. Such videos allow the user to change her/his viewing direction while consuming the video. The download-and-play paradigm (including DVD and Blu-ray) is replaced by streaming, and the content is hosted solely within the cloud. This paper addresses the need for a scientific framework enabling the adaptive delivery of omnidirectional video within heterogeneous environments. We consider the state-of-the-art techniques for adaptive streaming over HTTP and extend them towards omnidirectional/360-degree videos. In particular, we review the encoding and adaptive streaming options, and present preliminary results reported in the literature. Finally, we provide an overview about the ongoing standardization efforts and highlight the major open issues.

Slides:

Tuesday, February 13, 2018

MPEG news: a report from the 121st meeting, Gwangju, Korea

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The MPEG press release comprises the following topics:
  • Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
  • MPEG-G standards reach Committee Draft for metadata and APIs
  • MPEG issues Calls for Visual Test Material for Immersive Applications
  • Internet of Media Things (IoMT) reaches Committee Draft level
  • MPEG finalizes its Media Orchestration (MORE) standard
At the end I will also briefly summarize what else happened with respect to DASH, CMAF, OMAF as well as discuss future aspects of MPEG.

Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level

The Committee Draft (CD) for CDVA has been approved at the 121st MPEG meeting, which is the first formal step of the ISO/IEC approval process for a new standard. This will become a new part of MPEG-7 to support video search and retrieval applications (ISO/IEC 15938-15).

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors which can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers, so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. The CDVA standard specifies descriptors that fulfil these needs and includes (i) the components of the CDVA descriptor, (ii) its bitstream representation and (iii) the extraction process. The final standard is expected to be finished in early 2019.

CDVA introduces a new descriptor based on features which are output from a Deep Neural Network (DNN). CDVA is robust against viewpoint changes and moderate transformations of the video (e.g., re-encoding, overlays), it supports partial matching and temporal localization of the matching content. The CDVA descriptor has a typical size of 2–4 KBytes per second of video. For typical test cases, it has been demonstrated to reach a correct matching rate of 88% (at 1% false matching rate).

Research aspects: There are probably endless research aspects in the visual descriptor space ranging from validation of the achieved to results so far to further improving informative aspects with the goal to increase correct matching rate (and consequently decreasing the false matching rating). In general, however, the question is whether there's a need for descriptors in the era of bandwidth-storage-computing over-provisioning and the raising usage of artificial intelligence techniques such as machine learning and deep learning.

MPEG-G standards reach Committee Draft for metadata and APIs

In my previous report I introduced the MPEG-G standard for compression and transport technologies of genomic data. At the 121st MPEG meeting, metadata and APIs reached CD level. The former - metadata - provides relevant information associated to genomic data and the latter - APIs - allow for building interoperable applications capable of manipulating MPEG-G files. Additional standardization plans for MPEG-G include the CDs for reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-4), which are planned to be issued at the next 122nd MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.

Research aspects: Metadata typically enables certain functionality which can be tested and evaluated against requirements. APIs allow to build applications and services on top of the underlying functions, which could be a driver for research projects to make use of such APIs.

MPEG issues Calls for Visual Test Material for Immersive Applications

I have reported about the Omnidirectional Media Format (OMAF) in my previous report. At the 121st MPEG meeting, MPEG was working on extending OMAF functionalities to allow the modification of viewing positions, e.g., in case of head movements when using a head-mounted display, or for use with other forms of interactive navigation. Unlike OMAF which only provides 3 degrees of freedom (3DoF) for the user to view the content from a perspective looking outwards from the original camera position, the anticipated extension will also support motion parallax within some limited range which is referred to as 3DoF+. In the future with further enhanced technologies, a full 6 degrees of freedom (6DoF) will be achieved with changes of viewing position over a much larger range. To develop technology in these domains, MPEG has issued two Calls for Test Material in the areas of 3DoF+ and 6DoF, asking owners of image and video material to provide such content for use in developing and testing candidate technologies for standardization. Details about these calls can be found at https://mpeg.chiariglione.org/.

Research aspects: The good thing about test material is that it allows for reproducibility, which is an important aspect within the research community. Thus, it is more than appreciated that MPEG issues such a call and let's hope that this material will become publicly available. Typically this kind of visual test material targets coding but it would be also interesting to have such test content for storage and delivery.

Internet of Media Things (IoMT) reaches Committee Draft level

The goal of IoMT is is to facilitate the large-scale deployment of distributed media systems with interoperable audio/visual data and metadata exchange. This standard specifies APIs providing media things (i.e., cameras/displays and microphones/loudspeakers, possibly capable of significant processing power) with the capability of being discovered, setting-up ad-hoc communication protocols, exposing usage conditions, and providing media and metadata as well as services processing them. IoMT APIs encompass a large variety of devices, not just connected cameras and displays but also sophisticated devices such as smart glasses, image/speech analyzers and gesture recognizers. IoMT enables the expression of the economic value of resources (media and metadata) and of associated processing in terms of digital tokens leveraged by the use of blockchain technologies.

Research aspects: The main focus of IoMT is APIs which provides easy and flexible access to the underlying device' functionality and, thus, are an important factor to enable research within this interesting domain. For example, using these APIs to enable communicates among these various media things could bring up new forms of interaction with these technologies.

MPEG finalizes its Media Orchestration (MORE) standard

MPEG "Media Orchestration" (MORE) standard reached Final Draft International Standard (FDIS), the final stage of development before being published by ISO/IEC. The scope of the Media Orchestration standard is as follows:
  • It supports the automated combination of multiple media sources (i.e., cameras, microphones) into a coherent multimedia experience.
  • It supports rendering multimedia experiences on multiple devices simultaneously, again giving a consistent and coherent experience.
  • It contains tools for orchestration in time (synchronization) and space.
MPEG expects that the Media Orchestration standard to be especially useful in immersive media settings. This applies notably in social virtual reality (VR) applications, where people share a VR experience and are able to communicate about it. Media Orchestration is expected to allow synchronizing the media experience for all users, and to give them a spatially consistent experience as it is important for a social VR user to be able to understand when other users are looking at them.

Research aspects: This standard enables the social multimedia experience proposed in literature. Interestingly, the W3C is working on something similar referred to as timing object and it would be interesting to see whether these approaches have some commonalities.

What else happened at the MPEG meeting?

DASH is fully in maintenance mode and we are still waiting for the 3rd edition which is supposed to be a consolidation of existing corrigenda and amendments. Currently only minor extensions are proposed and conformance/reference software is being updated. Similar things can be said for CMAF where we have one amendment and one corrigendum under development. Additionally, MPEG is working on CMAF conformance. OMAF has reached FDIS at the last meeting and MPEG is working on reference software and conformance also. It is expected that in the future we will see additional standards and/or technical reports defining/describing how to use CMAF and OMAF in DASH.

Regarding the future video codec, the call for proposals is out since the last meeting as announced in my previous report and responses are due for the next meeting. Thus, it is expected that the 122nd MPEG meeting will be the place to be in terms of MPEG’s future video codec. Speaking about the future, shortly after the 121st MPEG, Leonardo Chiariglione published a blog post entitled “a crisis, the causes and a solution”, which is related to HEVC licensing, Alliance for Open Media (AOM), and possible future options. The blog post certainly caused some reactions within the video community at large and I think this was also intended. Let’s hope it will galvanice the video industry -- not to push the button -- but to start addressing and resolving the issues. As pointed out in one of my other blog posts about what to care about in 2018, the upcoming MPEG meeting in April 2018 is certainly a place to be. Additionally, it highlights some conferences related to various aspects also discussed in MPEG which I'd like to republish here:
  • QoMEX -- Int'l Conf. on Quality of Multimedia Experience -- will be hosted in Sardinia, Italy from May 29-31, which is THE conference to be for QoE of multimedia applications and services. Submission deadline is January 15/22, 2018.
  • MMSys -- Multimedia Systems Conf. -- and specifically Packet Video, which will be on June 12 in Amsterdam, The Netherlands. Packet Video is THE adaptive streaming scientific event 2018. Submission deadline is March 1, 2018.
  • Additionally, you might be interested in ICME (July 23-27, 2018, San Diego, USA), ICIP (October 7-10, 2018, Athens, Greece; specifically in the context of video coding), and PCS (June 24-27, 2018, San Francisco, CA, USA; also in the context of video coding).
  • The DASH-IF academic track hosts special events at MMSys (Excellence in DASH Award) and ICME (DASH Grand Challenge).
  • MIPR -- 1st Int'l Conf. on Multimedia Information Processing and Retrieval -- will be in Miami, Florida, USA from April 10-12, 2018. It has a broad range of topics including networking for multimedia systems as well as systems and infrastructures.

Thursday, December 14, 2017

MPEG news: a report from the 120th meeting, Macau, China

MPEG Meeting Plenary
The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The MPEG press release comprises the following topics:
  • Point Cloud Compression – MPEG evaluates responses to call for proposal and kicks off its technical work 
  • The omnidirectional media format (OMAF) has reached its final milestone 
  • MPEG-G standards reach Committee Draft for compression and transport technologies of genomic data 
  • Beyond HEVC – The MPEG & VCEG call to set the next standard in video compression 
  • MPEG adds better support for mobile environment to MMT 
  • New standard completed for Internet Video Coding 
  • Evidence of new video transcoding technology using side streams 

Point Cloud Compression

At its 120th meeting, MPEG analysed the technologies submitted by nine industry leaders as responses to the Call for Proposals (CfP) for Point Cloud Compression (PCC). These technologies address the lossless or lossy coding of 3D point clouds with associated attributes such as colour and material properties. Point clouds are referred to as unordered sets of points in a 3D space and typically captured using various setups of multiple cameras, depth sensors, LiDAR scanners, etc., but can also be generated synthetically and are in use in several industries. They have recently emerged as representations of the real world enabling immersive forms of interaction, navigation, and communication. Point clouds are typically represented by extremely large amounts of data providing a significant barrier for mass market applications. Thus, MPEG has issued a Call for Proposal seeking technologies that allow reduction of point cloud data for its intended applications. After a formal objective and subjective evaluation campaign, MPEG selected three technologies as starting points for the test models for static, animated, and dynamically acquired point clouds. A key conclusion of the evaluation was that state-of-the-art point cloud compression can be significantly improved by leveraging decades of 2D video coding tools and combining 2D and 3D compression technologies. Such an approach provides synergies with existing hardware and software infrastructures for rapid deployment of new immersive experiences.
Although the initial selection of technologies for point cloud compression has been concluded at the 120th MPEG meeting, it could be also seen as a kick-off for its scientific evaluation and various further developments including the optimization thereof. It is expected that various scientific conference will focus on point cloud compression and may open calls for grand challenges like for example at IEEE ICME 2018.

Omnidirectional Media Format (OMAF)

The understanding of the virtual reality (VR) potential is growing but market fragmentation caused by the lack of interoperable formats for the storage and delivery of such content stifles VR’s market potential. MPEG’s recently started project referred to as Omnidirectional Media Format (OMAF) has reached Final Draft of International Standard (FDIS) at its 120th meeting. It includes
  • equirectangular projection and cubemap projection as projection formats; 
  • signalling of metadata required for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data; and 
  • provides a selection of audio-visual codecs for this application. 
It also includes technologies to arrange video pixel data in numerous ways to improve compression efficiency and reduce the size of video, a major bottleneck for VR applications and services, The standard also includes technologies for the delivery of OMAF content with MPEG-DASH and MMT.
MPEG has defined a format comprising a minimal set of tools to enable interoperability among implementers of the standard. Various aspects are deliberately excluded from the normative parts to foster innovation leading to novel products and services. This enables us -- researcher and practitioners -- to experiment with these new formats in various ways and focus on informative aspects where typically competition can be found. For example, efficient means for encoding and packaging of omnidirectional/360-degree media content and its adaptive streaming including support for (ultra-)low latency will become a big issue in the near future.

MPEG-G: Compression and Transport Technologies of Genomic Data

The availability of high throughput DNA sequencing technologies opens new perspectives in the treatment of several diseases making possible the introduction of new global approaches in public health known as “precision medicine”. While routine DNA sequencing in the doctor’s office is still not current practice, medical centers have begun to use sequencing to identify cancer and other diseases and to find effective treatments. As DNA sequencing technologies produce extremely large amounts of data and related information, the ICT costs of storage, transmission, and processing are also very high. The MPEG-G standard addresses and solves the problem of efficient and economical handling of genomic data by providing new
  • compression technologies (ISO/IEC 23092-2) and 
  • transport technologies (ISO/IEC 23092-1), 
which reached Committee Draft level at its 120th meeting.
Additionally, the Committee Drafts for
  • metadata and APIs (ISO/IEC 23092-3) and 
  • reference software (ISO/IEC 23092-4) 
are scheduled for the next MPEG meeting and the goal is to publish Draft International Standards (DIS) at the end of 2018.
This new type of (media) content, which requires compression and transport technologies, is emerging within the multimedia community at large and, thus, input is welcome.

Beyond HEVC – The MPEG & VCEG Call to set the Next Standard in Video Compression

The 120th MPEG meeting marked the first major step toward the next generation of video coding standard in the form of a joint Call for Proposals (CfP) with ITU-T SG16’s VCEG. After two years of collaborative informal exploration studies and a gathering of evidence that successfully concluded at the 118th MPEG meeting, MPEG and ITU-T SG16 agreed to issue the CfP for future video coding technology with compression capabilities that significantly exceed those of the HEVC standard and its current extensions. They also formalized an agreement on formation of a joint collaborative team called the “Joint Video Experts Team” (JVET) to work on development of the new planned standard, pending the outcome of the CfP that will be evaluated at the 122nd MPEG meeting in April 2018. To evaluate the proposed compression technologies, formal subjective tests will be performed using video material submitted by proponents in February 2018. The CfP includes the testing of technology for 360° omnidirectional video coding and the coding of content with high-dynamic range and wide colour gamut in addition to conventional standard-dynamic-range camera content. Anticipating a strong response to the call, a “test model” draft design is expected be selected in 2018, with development of a potential new standard in late 2020.
The major goal of a new video coding standard is to be better than its successor (HEVC). Typically this "better" is quantified by 50% which means, that it should be possible encode the video at the same quality with half of the bitrate or a significantly higher quality with the same bitrate including. However, at this time the “Joint Video Experts Team” (JVET) from MPEG and ITU-T SG16 faces competition from the Alliance for Open Media, which is working on AV1. In any case, we are looking forward to an exciting time frame from now until this new codec is ratified and how it will perform compared to AV1. Multimedia systems and applications will also benefit from new codecs which will gain traction as soon as first implementations of this new codec becomes available (note that AV1 is available as open source already and continuously further developed).

MPEG adds Better Support for Mobile Environment to MPEG Media Transport (MMT)

MPEG has approved the Final Draft Amendment (FDAM) to MPEG Media Transport (MMT; ISO/IEC 23008-1:2017), which is referred to as “MMT enhancements for mobile environments”. In order to reflect industry needs on MMT, which has been well adopted by broadcast standards such as ATSC 3.0 and Super Hi-Vision, it addresses several important issues on the efficient use of MMT in mobile environments. For example, it adds distributed resource identification message to facilitate multipath delivery and transition request message to change the delivery path of an active session. This amendment also introduces the concept of a MMT-aware network entity (MANE), which might be placed between the original server and the client, and provides a detailed description about how to use it for both improving efficiency and reducing delay of delivery. Additionally, this amendment provides a method to use WebSockets to setup and control an MMT session/presentation.

New Standard Completed for Internet Video Coding

A new standard for video coding suitable for the internet as well as other video applications, was completed at the 120th MPEG meeting. The Internet Video Coding (IVC) standard was developed with the intention of providing the industry with an “Option 1” video coding standard. In ISO/IEC language, this refers to a standard for which patent holders have declared a willingness to grant licenses free of charge to an unrestricted number of applicants for all necessary patents on a worldwide, non-discriminatory basis and under other reasonable terms and conditions, to enable others to make, use, and sell implementations of the standard. At the time of completion of the IVC standard, the specification contained no identified necessary patent rights except those available under Option 1 licensing terms. During the development of IVC, MPEG removed from the draft standard any necessary patent rights that it was informed were not available under such Option 1 terms, and MPEG is optimistic of the outlook for the new standard. MPEG encourages interested parties to provide information about any other similar cases. The IVC standard has roughly similar compression capability as the earlier AVC standard, which has become the most widely deployed video coding technology in the world. Tests have been conducted to verify IVC’s strong technical capability, and the new standard has also been shown to have relatively modest implementation complexity requirements.

Evidence of new Video Transcoding Technology using Side Streams

Following a “Call for Evidence” (CfE) issued by MPEG in July 2017, evidence was evaluated at the 120th MPEG meeting to investigate whether video transcoding technology has been developed for transcoding assisted by side data streams that is capable of significantly reducing the computational complexity without reducing compression efficiency. The evaluations of the four responses received included comparisons of the technology against adaptive bit-rate streaming using simulcast as well as against traditional transcoding using full video re-encoding. The responses span the compression efficiency space between simulcast and full transcoding, with trade-offs between the bit rate required for distribution within the network and the bit rate required for delivery to the user. All four responses provided a substantial computational complexity reduction compared to transcoding using full re-encoding. MPEG plans to further investigate transcoding technology and is soliciting expressions of interest from industry on the need for standardization of such assisted transcoding using side data streams.

MPEG currently works on two related topics which are referred to as network-distributed video coding (NDVC) and network-based media processing (NBMP). Both activities involve the network, which is more and more evolving to highly distributed compute and delivery platform as opposed to a bit pipe, which is supposed to deliver data as fast as possible from A to B. This phenomena could be also interesting when looking at developments around 5G, which is actually much more than just radio access technology. These activities are certainly worth to monitor as it basically contributes in order to make networked media resources accessible or even programmable. In this context, I would like to refer the interested reader to the December'17 theme of the IEEE Computer Society Computing Now, which is about Advancing Multimedia Content Distribution.
Publicly available documents from the 120th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Gwangju, Korea, January 22-26, 2018. Feel free to contact Christian Timmerer for any questions or comments.
Some of the activities reported above are considered within the Call for Papers at 23rd Packet Video Workshop (PV 2018) co-located with ACM MMSys 2018 in Amsterdam, The Netherlands. Topics of interest include (but are not limited to):
  • Adaptive media streaming, and content storage, distribution and delivery 
  • Network-distributed video coding and network-based media processing 
  • Next-generation/future video coding, point cloud compression 
  • Audiovisual communication, surveillance and healthcare systems 
  • Wireless, mobile, IoT, and embedded systems for multimedia applications 
  • Future media internetworking: information-centric networking and 5G 
  • Immersive media: virtual reality (VR), augmented reality (AR), 360° video and multi-sensory systems, and its streaming 
  • Machine learning in media coding and streaming systems 
  • Standardization: DASH, MMT, CMAF, OMAF, MiAF, WebRTC, MSE, EME, WebVR, Hybrid Media, WAVE, etc.
  • Applications: social media, game streaming, personal broadcast, healthcare, industry 4.0, education, transportation, etc. 
Important dates
  • Submission deadline: March 1, 2018 
  • Acceptance notification: April 9, 2018 
  • Camera-ready deadline: April 19, 2018

Monday, September 4, 2017

MPEG news: a report from the 119th meeting, Turin, Italy

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The MPEG press release comprises the following topics:
  • Evidence of New Developments in Video Compression Coding
  • Call for Evidence on Transcoding for Network Distributed Video Coding
  • 2nd Edition of Storage of Sample Variants reaches Committee Draft
  • New Technical Report on Signalling, Backward Compatibility and Display Adaptation for HDR/WCG Video Coding
  • Draft Requirements for Hybrid Natural/Synthetic Scene Data Container

Evidence of New Developments in Video Compression Coding

At the 119th MPEG meeting, responses to the previously issued call for evidence have been evaluated and they have all successfully demonstrated evidence. The call requested responses for use cases of video coding technology in three categories:
  • standard dynamic range (SDR) — two responses;
  • high dynamic range (HDR) — two responses; and
  • 360° omnidirectional video — four responses.
The evaluation of the responses included subjective testing and an assessment of the performance of the “Joint Exploration Model” (JEM).

The results indicate significant gains over HEVC for a considerable number of test cases with comparable subjective quality at 40-50% less bit rate compared to HEVC for the SDR and HDR test cases with some positive outliers (i.e., higher bit rate savings). Thus, the MPEG-VCEG Joint Video Exploration Team (JVET) concluded that evidence exists of compression technology that may significantly outperform HEVC after further development to establish a new standard. As a next step, the plan is to issue a call for proposals at 120th MPEG meeting (October 2017) and responses expected to be evaluated at the 122th MPEG meeting (April 2018).

We already witness an increase of research articles addressing video coding technologies with capabilities beyond HEVC which will further increase in the future. The main driving force is over the top (OTT) delivery which calls for more efficient bandwidth utilization. However, competition is also increasing with the emergence of AV1 of AOMedia and we may observe also an increasing number of articles in that direction including evaluations thereof. An interesting aspect is also that the number of use cases is also increasing (e.g., see different categories above), which adds further challenges to the "complex video problem".

Call for Evidence on Transcoding for Network Distributed Video Coding

The call for evidence on transcoding for network distributed video coding targets interested parties possessing technology providing transcoding of video at lower computational complexity than transcoding done using a full re-encode. The primary application is adaptive bitrate streaming where a highest bitrate stream is transcoded into lower bitrate streams. It is expected that responses may use “side streams” (or side information, some may call it metadata) accompanying the highest bitrate stream to assist in the transcoding process. MPEG expects submissions for the 120th MPEG meeting where compression efficiency and computational complexity will be assessed.

Transcoding has been discussed already for a long time and I can certainly recommend this article from 2005 published in the Proceedings of the IEEE. The question is, what is different now, 12 years later, and what metadata (or side streams/information) is required for interoperability among different vendors (if any)?

A Brief Overview of Remaining Topics...

  • The 2nd edition of storage of sample variants reaches Committee Draft and expands its usage to MPEG-2 transport stream whereas the first edition primarily focused on ISO base media file format.
  • The new technical report for high dynamic range (HDR) and wide colour gamut (WCG) video coding comprises a survey of various signaling mechanisms including backward compatibility and display adaptation.
  • MPEG issues draft requirements for a scene representation media container enabling the interchange of content for authoring and rendering rich immersive experiences which is currently referred to as hybrid natural/synthetic scene (HNSS) data container.

Other MPEG (Systems) Activities at the 119th Meeting

DASH is in fully maintenance mode as only minor enhancements/corrections have been discussed including contributions to conformance and reference software. The omnidirectional media format (OMAF) is certainly the hottest topic within MPEG systems which is actually between two stages (i.e., between DIS and FDIS) and, thus, a study of DIS has been approved and national bodies are kindly requested to take this into account when casting their votes (incl. comments). The study of DIS comprises format definitions with respect to coding and storage of omnidirectional media including audio and video (aka 360°). The common media application format (CMAF) has been ratified at the last meeting and awaits publications by ISO. In the meantime CMAF is focusing on conformance and reference software as well as amendments regarding various media profiles. Finally, requirements for a multi-image application format (MiAF) are available since the last meeting and at the 119th MPEG meeting a work draft has been approved. MiAF will be based on HEIF and the goal is to define additional constraints to simplify its file format options.

We have successfully demonstrated live 360 adaptive streaming as described here but we expect various improvements from standards available and under development of MPEG. Research aspects in these areas are certainly interesting in the area of performance gains and evaluations with respect to bandwidth efficiency in open networks as well as how these standardization efforts could be used to enable new use cases. 

Publicly available documents from the 119th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Macau, China, October 23-27, 2017. Feel free to contact me for any questions or comments.

Friday, February 10, 2017

MPEG news: a report from the 117th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 117th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Committee Draft of the Omnidirectional Media Application Format (OMAF)
  • MPEG-H 3D Audio Verification Test Report
  • MPEG Workshop on 5-Year Roadmap Successfully Held in Geneva
  • Call for Proposals (CfP) for Point Cloud Compression (PCC)
  • Preliminary Call for Evidence on video compression with capability beyond HEVC
  • MPEG issues Committee Draft of the Media Orchestration (MORE) Standard
  • Technical Report on HDR/WCG Video Coding
In this blog post, I'd like to focus on the topics related to multimedia communication. Thus, let's start with OMAF.

Omnidirectional Media Application Format (OMAF)

Real-time entertainment services deployed over the open, unmanaged Internet – streaming audio and video – account now for more than 70% of the evening traffic in North American fixed access networks and it is assumed that this figure will reach 80 percent by 2020. More and more such bandwidth hungry applications and services are pushing onto the market including immersive media services such as virtual reality and, specifically 360-degree videos. However, the lack of appropriate standards and, consequently, reduced interoperability is becoming an issue. Thus, MPEG has started a project referred to as Omnidirectional Media Application Format (OMAF). The first milestone of this standard has been reached and the committee draft (CD) has been approved at the 117th MPEG meeting. Such application formats "are essentially superformats that combine selected technology components from MPEG (and other) standards to provide greater application interoperability, which helps satisfy users' growing need for better-integrated multimedia solutions" [MPEG-A]." In the context of OMAF, the following aspects are defined:
  • Equirectangular projection format (note: others might be added in the future)
  • Metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data
  • Storage format: ISO base media file format (ISOBMFF)
  • Codecs: High Efficiency Video Coding (HEVC) and MPEG-H 3D audio
OMAF is the first specification which is defined as part of a bigger project currently referred to as ISO/IEC 23090 -- Immersive Media (Coded Representation of Immersive Media). It currently has the acronym MPEG-I and we have previously used MPEG-VR which is now replaced by MPEG-I (that still might chance in the future). It is expected that the standard will become Final Draft International Standard (FDIS) by Q4 of 2017. Interestingly, it does not include AVC and AAC, probably the most obvious candidates for video and audio codecs which have been massively deployed in the last decade and probably still will be a major dominator (and also denominator) in upcoming years. On the other hand, the equirectangular projection format is currently the only one defined as it is broadly used already in off-the-shelf hardware/software solutions for the creation of omnidirectional/360-degree videos. Finally, the metadata formats enabling the rendering of 360-degree monoscopic and stereoscopic video is highly appreciated. A solution for MPEG-DASH based on AVC/AAC utilizing equirectangular projection format for both monoscopic and stereoscopic video is shown as part of Bitmovin's solution for VR and 360-degree video.

Research aspects related to OMAF can be summarized as follows:
  • HEVC supports tiles which allow for efficient streaming of omnidirectional video but HEVC is not as widely deployed as AVC. Thus, it would be interesting how to mimic such a tile-based streaming approach utilizing AVC.
  • The question how to efficiently encode and package HEVC tile-based video is an open issue and call for a tradeoff between tile flexibility and coding efficiency.
  • When combined with MPEG-DASH (or similar), there's a need to update the adaptation logic as the with tiles yet another dimension is added that needs to be considered in order to provide a good Quality of Experience (QoE).
  • QoE is a big issue here and not well covered in the literature. Various aspects are worth to be investigated including a comprehensive dataset to enable reproducibility of research results in this domain. Finally, as omnidirectional video allows for interactivity, also the user experience is becoming an issue which needs to be covered within the research community.
A second topic I'd like to highlight in this blog post is related to the preliminary call for evidence on video compression with capability beyond HEVC.

Preliminary Call for Evidence on video compression with capability beyond HEVC

A call for evidence is issued to see whether sufficient technological potential exists to start a more rigid phase of standardization. Currently, MPEG together with VCEG have developed a Joint Exploration Model (JEM) algorithm that is already known to provide bit rate reductions in the range of 20-30% for relevant test cases, as well as subjective quality benefits. The goal of this new standard -- with a preliminary target date for completion around late 2020 -- is to develop technology providing better compression capability than the existing standard, not only for conventional video material but also for other domains such as HDR/WCG or VR/360-degrees video. An important aspect in this area is certainly over-the-top video delivery (like with MPEG-DASH) which includes features such as scalability and Quality of Experience (QoE). Scalable video coding has been added to video coding standards since MPEG-2 but never reached wide-spread adoption. That might change in case it becomes a prime-time feature of a new video codec as scalable video coding clearly shows benefits when doing dynamic adaptive streaming over HTTP. QoE did find its way already into video coding, at least when it comes to evaluating the results where subjective tests are now an integral part of every new video codec developed by MPEG (in addition to usual PSNR measurements). Therefore, the most interesting research topics from a multimedia communication point of view would be to optimize the DASH-like delivery of such new codecs with respect to scalability and QoE. Note that if you don't like scalable video coding, feel free to propose something else as long as it reduces storage and networking costs significantly.

MPEG Workshop “Global Media Technology Standards for an Immersive Age”

On January 18, 2017 MPEG successfully held a public workshop on “Global Media Technology Standards for an Immersive Age” hosting a series of keynotes from Bitmovin, DVB, Orange, Sky Italia, and Technicolor. Stefan Lederer, CEO of Bitmovin discussed today's and future challenges with new forms of content like 360°, AR and VR. All slides are available here and MPEG took their feedback into consideration in an update of its 5-year standardization roadmap. David Wood (EBU) reported on the DVB VR study mission and Ralf Schaefer (Technicolor) presented a snapshot on VR services. Gilles Teniou (Orange) discussed video formats for VR pointing out a new opportunity to increase the content value but also raising a question what is missing today. Finally, Massimo Bertolotti (Sky Italia) introduced his view on the immersive media experience age.

Overall, the workshop was well attended and as mentioned above, MPEG is currently working on a new standards project related to immersive media. Currently, this project comprises five parts. The first part comprises a technical report describing the scope (incl. kind of system architecture), use cases, and applications. The second part is OMAF (see above) and the third/forth parts are related to immersive video and audio respectively. Part five is about point cloud compression.

For those interested, please check out the slides from industry representatives in this field and draw your own conclusions what could be interesting for your own research. I'm happy to see any reactions, hints, etc. in the comments..

Finally, let's have a look what happened related to MPEG-DASH, a topic with a long history on this blog.

MPEG-DASH and CMAF: Friend or Foe?

For MPEG-DASH and CMAF it was a meeting "in between" official standardization stages. MPEG-DASH experts are still working on the third edition which will be a consolidated version of the 2nd edition and various amendments and corrigenda. In the meantime, MPEG issues a white paper on the new features of MPEG-DASH which I would like to highlight here.
  • Spatial Relationship Description (SRD): allows to describe tiles and region of interests for partial delivery of media presentations. This is highly related to OMAF and VR/360-degree video streaming.
  • External MPD linking: this feature allows to describe the relationship between a single program/channel and a preview mosaic channel having all channels at once within the MPD.
  • Period continuity: simple signaling mechanism to indicate whether one period is a continuation of the previous one which is relevant for ad-insertion or live programs.
  • MPD chaining: allows for chaining two or more MPDs to each other, e.g., pre-roll ad when joining a live program.
  • Flexible segment format for broadcast TV: separates the signaling of the switching points and random access points in each stream and, thus, the content can be encoded with a good compression efficiency, yet allowing higher number of random access point, but with lower frequency of switching points.
  • Server and network-assisted DASH (SAND): enables asynchronous network-to-client and network-to-network communication of quality-related assisting information.
  • DASH with server push and WebSockets: basically addresses issues related to HTTP/2 push feature and WebSocket.
CMAF issued a study document which captures the current progress and all national bodies are encouraged to take this into account when commenting on the Committee Draft (CD). To answer the question in the headline above, it looks more and more like as DASH and CMAF will become friends -- let's hope that the friendship lasts for a long time.

What else happened at the MPEG meeting?

  • Committee Draft MORE (note: type in 'man more' on any unix/linux/max terminal and you'll get 'less - opposite of more';): MORE stands for “Media Orchestration” and provides a specification that enables the automated combination of multiple media sources (cameras, microphones) into a coherent multimedia experience. Additionally, it targets use cases where a multimedia experience is rendered on multiple devices simultaneously, again giving a consistent and coherent experience.
  • Technical Report on HDR/WCG Video Coding: This technical report comprises conversion and coding practices for High Dynamic Range (HDR) and Wide Colour Gamut (WCG) video coding (ISO/IEC 23008-14). The purpose of this document is to provide a set of publicly referenceable recommended guidelines for the operation of AVC or HEVC systems adapted for compressing HDR/WCG video for consumer distribution applications
  • CfP Point Cloud Compression (PCC): This call solicits technologies for the coding of 3D point clouds with associated attributes such as color and material properties. It will be part of the immersive media project introduced above.
  • MPEG-H 3D Audio verification test report: This report presents results of four subjective listening tests that assessed the performance of the Low Complexity Profile of MPEG-H 3D Audio. The tests covered a range of bit rates and a range of “immersive audio” use cases (i.e., from 22.2 down to 2.0 channel presentations). Seven test sites participated in the tests with a total of 288 listeners.
The next MPEG meeting will be held in Hobart, April 3-7, 2017. Feel free to contact us for any questions or comments.