Showing posts with label mpeg-g. Show all posts
Showing posts with label mpeg-g. Show all posts

Thursday, July 16, 2020

MPEG131 Press Release: WG11 (MPEG) issues a Call for Proposals on extension and improvements to ISO/IEC 23092 standard series

MPEG131 Press Release: Index

WG11 (MPEG) issues a Call for Proposals on extension and improvements to ISO/IEC 23092 standard series

The current MPEG-G standard series (ISO/IEC 23092) is the first generation of MPEG standards that address the representation, compression, and transport of genome sequencing data, supporting with a single unified approach data from the output of sequencing machines up to secondary and tertiary analysis. New technology for compressing and indexing a wide variety of annotation data is currently under advanced standardization phase.

In line with the traditional MPEG practice of investigating and applying whenever possible improvements to the performance and functionality of its standards, at its 131st meeting, MPEG has issued a Call for Proposals (CfP) addressing two specific objectives: (i) to increase the speed performance of massively parallel codec implementations and (ii) to enable advanced queries and search capabilities on the compressed data.

Answers to the CfP are expected to be evaluated prior to the 132nd MPEG meeting. Best performing technology are expected to be introduced in a new high-performance profile of current ISO/IEC 23092 standard series.

Wednesday, November 27, 2019

MPEG news: a report from the 128th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The 128th MPEG meeting concluded on October 11, 2019 in Geneva, Switzerland with the following topics:
  • Low Complexity Enhancement Video Coding (LCEVC) Promoted to Committee Draft
  • 2nd Edition of Omnidirectional Media Format (OMAF) has reached the first milestone
  • Genomic Information Representation – Part 4 Reference Software and Part 5 Conformance Promoted to Draft International Standard
The corresponding press release of the 128th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/128.
In this report we will focus on video coding aspects (i.e., LCEVC) and immersive media applications (i.e., OMAF). At the end, we will provide an update related to adaptive streaming (i.e., DASH and CMAF).

Low Complexity Enhancement Video Coding

Low Complexity Enhancement Video Coding (LCEVC) has been promoted to committee draft (CD) which is the first milestone in the ISO/IEC standardization process. LCEVC is part two of MPEG-5 or ISO/IEC 23094-2 if you prefer the always easy-to-remember ISO codes. We introduced MPEG-5 already in previous posts and LCEVC is about a standardized video coding solution that leverages other video codecs in a manner that improves video compression efficiency while maintaining or lowering the overall encoding and decoding complexity.
The LCEVC standard uses a lightweight video codec to add up to two layers of encoded residuals. The aim of these layers is correcting artefacts produced by the base video codec and adding detail and sharpness for the final output video.
The target of this standard comprises software or hardware codecs with extra processing capabilities, e.g., mobile devices, set top boxes (STBs), and personal computer based decoders. Additional benefits are the reduction in implementation complexity or a corresponding expansion in spatial resolution.
LCEVC is based on existing codecs which allows for backwards-compatibility with existing deployments. Supporting LCEVC enables “softwareized” video coding allowing for release and deployment options known from software-based solutions which are well understood by software companies and, thus, opens new opportunities in improving and optimizing video-based services and applications.
Research aspects: in video coding, research efforts are mainly related to coding efficiency and complexity (as usual). However, as MPEG-5 basically adds a software layer on top of what is typically implemented in hardware, all kind of aspects related to software engineering could become an active area of research.

Omnidirectional Media Format

The scope of the Omnidirectional Media Format (OMAF) is about 360° video, images, audio and associated timed text and specifies (i) a coordinate system, (ii) projection and rectangular region-wise packing methods, (iii) storage of omnidirectional media and the associated metadata using ISOBMFF, (iv) encapsulation, signaling and streaming of omnidirectional media in DASH and MMT, and (v) media profiles and presentation profiles.
At this meeting, the second edition of OMAF (ISO/IEC 23090-2) has been promoted to committee draft (CD) which includes
  • support of improved overlay of graphics or textual data on top of video,
  • efficient signaling of videos structured in multiple sub parts,
  • enabling more than one viewpoint, and
  • new profiles supporting dynamic bitstream generation according to the viewport.
As for the first edition, OMAF includes encapsulation and signaling in ISOBMFF as well as streaming of omnidirectional media (DASH and MMT). It will reach its final milestone by the end of 2020.
360° video is certainly a vital use case towards a fully immersive media experience. Devices to capture and consume such content are becoming increasingly available and will probably contribute to the dissemination of this type of content. However, it is also understood that the complexity increases significantly, specifically with respect to large-scale, scalable deployments due to increased content volume/complexity, timing constraints (latency), and quality of experience issues.
Research aspects: understanding the increased complexity of 360° video or immersive media in general is certainly an important aspect to be addressed towards enabling applications and services in this domain. We may even start thinking that 360° video actually works (e.g., it's possible to capture, upload to YouTube and consume it on many devices) but the devil is in the detail in order to handle this complexity in an efficient way to enable seamless and high quality of experience.

DASH and CMAF

The 4th edition of DASH (ISO/IEC 23009-1) will be published soon and MPEG is currently working towards a first amendment which will be about (i) CMAF support and (ii) event processing model. An overview of all DASH standards is depicted in the figure below, notably part one of MPEG-DASH referred to as media presentation description and segment formats.
The 2nd edition of the CMAF standard (ISO/IEC 23000-19) will become available very soon and MPEG is currently reviewing additional tools in the so-called technologies under considerations document as well as conducting various explorations. A working draft for additional media profiles is also under preparation.
Research aspects: with CMAF, low-latency supported is added to DASH-like applications and services. However, the implementation specifics are actually not defined in the standard and subject to competition (e.g., here). Interestingly, the Bitmovin video developer reports from both 2018 and 2019 highlight the need for low-latency solutions in this domain.
At the ACM Multimedia Conference 2019 in Nice, France I gave a tutorial entitled “A Journey towards Fully Immersive Media Access” which includes updates related to DASH and CMAF. The slides are available here.

Outlook 2020

Finally, let me try giving an outlook for 2020, not so much content-wise but events planned for 2020 that are highly relevant for this column:
... and many more!

Tuesday, April 30, 2019

MPEG news: a report from the 126th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

MPEG News Archive

The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:
  • Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
  • Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
  • Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
  • Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
  • MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
  • MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies
The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126

Three Degrees of Freedom Plus (3DoF+)

MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video

MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.

At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.

Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about "Coding Technologies for Immersive Audio/Visual Experiences". Further details are available here.

Neural Network Compression for Multimedia Applications

MPEG evaluates responses to the Call for Proposal and kicks off its technical work

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.

At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.

After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).

Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.

Low Complexity Enhancement Video Coding

MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development

MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.

The target is to achieve:
  • coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
  • coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.
This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.

Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity -- both at encoder and decoder -- is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.

Point Cloud Compression

MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage

MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.

MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.

G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.

MPEG Media Transport (MMT)

MPEG approves 3rd Edition of Final Draft International Standard

MMT 3rd edition will introduce two aspects:
  • enhancements for mobile environments and
  • support of Contents Delivery Networks (CDNs).
The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.

Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.

On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.

Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.

Let me finish this blog post with...

DASH, what else?

MPEG is working towards 4th edition of ISO/IEC 23009-1 MPEG Dynamic Adaptive Streaming over HTTP (DASH) but I guess it won't have such a huge set of new tools added to the standard (as it was the case from 2nd to 3rd edition). However, no public information available so far. Other resources relevant to DASH can be found on DASH-IF web site (guidelines, dash.js), Mile High Video 2019, ACM MMSys 2019, etc.

Thursday, May 10, 2018

MPEG news: a report from the 122nd meeting, San Diego, CA, USA

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The MPEG press release comprises the following topics:
  • Versatile Video Coding (VVC) project starts strongly in the Joint Video Experts Team
  • MPEG issues Call for Proposals on Network-based Media Processing
  • MPEG finalizes 7th edition of MPEG-2 Systems Standard
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with two new features
  • MPEG-G standards reach Draft International Standard for transport and compression technologies

Versatile Video Coding (VVC) – MPEG’ & VCEG’s new video coding project starts strong

The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include — but not limited to — 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Interestingly, some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.

Research aspects: Compression tools and everything around it including its objective and subjective assessment. The main application area is clearly 360-degree and HDR. Watch out conferences like PCS and ICIP (later this year), which will be full of papers making references to VVC. Interestingly, VVC comes with a first draft, a test model for simulation experiments, and a technology benchmark set which is useful and important for any developments for both inside and outside MPEG as it allows for reproducibility.

MPEG issues Call for Proposals on Network-based Media Processing

This Call for Proposals (CfP) addresses advanced media processing technologies such as network stitching for VR service, super resolution for enhanced visual quality, transcoding, and viewport extraction for 360-degree video within the network environment that allows service providers and end users to describe media processing operations that are to be performed by the network. Therefore, the aim of network-based media processing (NBMP) is to allow end user devices to offload certain kinds of processing to the network. Therefore, NBMP describes the composition of network-based media processing services based on a set of media processing functions and makes them accessible through Application Programming Interfaces (APIs). Responses to the NBMP CfP will be evaluated on the weekend prior to the 123rd MPEG meeting in July 2018.

Research aspects: This project reminds me a lot about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG targets APIs rather than pure metadata formats, which is a step forward into the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

7th edition of MPEG-2 Systems Standard and ISO Base Media File Format (ISOBMFF) with two new features

More than 20 years since its inception development of MPEG-2 systems technology (i.e., transport/program stream) continues. New features include support for: (i) JPEG 2000 video with 4K resolution and ultra-low latency, (ii) media orchestration related metadata, (iii) sample variance, and (iv) HEVC tiles.

The partial file format enables the description of an ISOBMFF file partially received over lossy communication channels. This format provides tools to describe reception data, the received data and document transmission information such as received or lost byte ranges and whether the corrupted/lost bytes are present in the file and repair information such as location of the source file, possible byte offsets in that source, byte stream position at which a parser can try processing a corrupted file. Depending on the communication channel, this information may be setup by the receiver or through out-of-band means.

ISOBMFF's sample variants (2nd edition), which are typically used to provide forensic information in the rendered sample data that can, for example, identify the specific Digital Rights Management (DRM) client which has decrypted the content. This variant framework is intended to be fully compatible with MPEG’s Common Encryption (CENC) and agnostic to the particular forensic marking system used.

Research aspects: MPEG systems standards are mainly relevant for multimedia systems research with all its characteristics. The partial file format is specifically interesting as it targets scenarios with lossy communication channels.

MPEG-G standards reach Draft International Standard for transport and compression technologies

MPEG-G provides a set of standards enabling interoperability for applications and services dealing with high-throughput deoxyribonucleic acid (DNA) sequencing. At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Research aspects: the main focus of this work item is compression and transport is still in its infancy. Therefore, research on the actual delivery for compressed DNA information as well as its processing is solicited.

What else happened at MPEG122?

  • Requirements is exploring new video coding tools dealing with low-complexity and process enhancements.
  • The activity around coded representation of neural networks has defined a set of vital use cases and is now calling for test data to be solicited until the next meeting.
  • The MP4 registration authority (MP4RA) has a new awesome web site http://mp4ra.org/.
  • MPEG-DASH is finally approving and working the 3rd edition comprising consolidated version of recent amendments and corrigenda.
  • CMAF started an exploration on multi-stream support, which could be relevant for tiled streaming and multi-channel audio.
  • OMAF kicked-off its activity towards a 2nd edition enabling support for 3DoF+ and social VR with the plan going to committee draft (CD) in Oct’18. Additionally, there’s a test framework proposed, which allows to assess performance of various OMAF tools. Its main focus is on video but MPEG’s audio subgroup has a similar framework to enable subjective testing. It could be interesting seeing these two frameworks combined in one way or the other.
  • MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively.
  • Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.
Research aspects: In this section I would like to focus on DASH, CMAF, and OMAF. Multi-stream support, as mentioned above, is relevant for tiled streaming and multi-channel audio which has been recently studied in the literature and is also highly relevant for industry. The efficient storage and streaming of such kind of content within the file format is an important aspect and often underrepresented in both research and standardization. The goal here is to keep the overhead low while maximizing the utility of the format to enable certain functionalities. OMAF now targets the social VR use case, which has been discussed in the research literature for a while and, finally, makes its way into standardization. An important aspect here is both user and quality of experience, which requires intensive subjective testing.

Finally, on May 10 MPEG will celebrate 30 years as its first meeting dates back to 1988 in Ottawa, Canada with around 30 attendees. The 122nd meeting had more than 500 attendees and MPEG has around 20 active work items. A total of more than 170 standards have been produces (that’s approx. six standards per year) where some standards have up to nine editions like the HEVC standards. Overall, MPEG is responsible for more that 23% of all JTC 1 standards and some of them showing extraordinary longevity regarding extensions, e.g., MPEG-2 systems (24 years), MPEG-4 file format (19 years), and AVC (15 years). MPEG standards serve billions of users (e.g., MPEG-1 video, MP2, MP3, AAC, MPEG-2, AVC, ISOBMFF, DASH). Some — more precisely five — standards have receive Emmy awards in the past (MPEG-1, MPEG-2, AVC (2x), and HEVC).
Tag cloud generated from all existing MPEG press releases.
Thus, happy birthday MPEG! In today’s society starts the high performance era with 30 years, basically the time of “compression”, i.e., we apply all what we learnt and live out everything, truly optimistic perspective for our generation X (millennials) standards body!

Tuesday, February 13, 2018

MPEG news: a report from the 121st meeting, Gwangju, Korea

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The MPEG press release comprises the following topics:
  • Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
  • MPEG-G standards reach Committee Draft for metadata and APIs
  • MPEG issues Calls for Visual Test Material for Immersive Applications
  • Internet of Media Things (IoMT) reaches Committee Draft level
  • MPEG finalizes its Media Orchestration (MORE) standard
At the end I will also briefly summarize what else happened with respect to DASH, CMAF, OMAF as well as discuss future aspects of MPEG.

Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level

The Committee Draft (CD) for CDVA has been approved at the 121st MPEG meeting, which is the first formal step of the ISO/IEC approval process for a new standard. This will become a new part of MPEG-7 to support video search and retrieval applications (ISO/IEC 15938-15).

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors which can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers, so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. The CDVA standard specifies descriptors that fulfil these needs and includes (i) the components of the CDVA descriptor, (ii) its bitstream representation and (iii) the extraction process. The final standard is expected to be finished in early 2019.

CDVA introduces a new descriptor based on features which are output from a Deep Neural Network (DNN). CDVA is robust against viewpoint changes and moderate transformations of the video (e.g., re-encoding, overlays), it supports partial matching and temporal localization of the matching content. The CDVA descriptor has a typical size of 2–4 KBytes per second of video. For typical test cases, it has been demonstrated to reach a correct matching rate of 88% (at 1% false matching rate).

Research aspects: There are probably endless research aspects in the visual descriptor space ranging from validation of the achieved to results so far to further improving informative aspects with the goal to increase correct matching rate (and consequently decreasing the false matching rating). In general, however, the question is whether there's a need for descriptors in the era of bandwidth-storage-computing over-provisioning and the raising usage of artificial intelligence techniques such as machine learning and deep learning.

MPEG-G standards reach Committee Draft for metadata and APIs

In my previous report I introduced the MPEG-G standard for compression and transport technologies of genomic data. At the 121st MPEG meeting, metadata and APIs reached CD level. The former - metadata - provides relevant information associated to genomic data and the latter - APIs - allow for building interoperable applications capable of manipulating MPEG-G files. Additional standardization plans for MPEG-G include the CDs for reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-4), which are planned to be issued at the next 122nd MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.

Research aspects: Metadata typically enables certain functionality which can be tested and evaluated against requirements. APIs allow to build applications and services on top of the underlying functions, which could be a driver for research projects to make use of such APIs.

MPEG issues Calls for Visual Test Material for Immersive Applications

I have reported about the Omnidirectional Media Format (OMAF) in my previous report. At the 121st MPEG meeting, MPEG was working on extending OMAF functionalities to allow the modification of viewing positions, e.g., in case of head movements when using a head-mounted display, or for use with other forms of interactive navigation. Unlike OMAF which only provides 3 degrees of freedom (3DoF) for the user to view the content from a perspective looking outwards from the original camera position, the anticipated extension will also support motion parallax within some limited range which is referred to as 3DoF+. In the future with further enhanced technologies, a full 6 degrees of freedom (6DoF) will be achieved with changes of viewing position over a much larger range. To develop technology in these domains, MPEG has issued two Calls for Test Material in the areas of 3DoF+ and 6DoF, asking owners of image and video material to provide such content for use in developing and testing candidate technologies for standardization. Details about these calls can be found at https://mpeg.chiariglione.org/.

Research aspects: The good thing about test material is that it allows for reproducibility, which is an important aspect within the research community. Thus, it is more than appreciated that MPEG issues such a call and let's hope that this material will become publicly available. Typically this kind of visual test material targets coding but it would be also interesting to have such test content for storage and delivery.

Internet of Media Things (IoMT) reaches Committee Draft level

The goal of IoMT is is to facilitate the large-scale deployment of distributed media systems with interoperable audio/visual data and metadata exchange. This standard specifies APIs providing media things (i.e., cameras/displays and microphones/loudspeakers, possibly capable of significant processing power) with the capability of being discovered, setting-up ad-hoc communication protocols, exposing usage conditions, and providing media and metadata as well as services processing them. IoMT APIs encompass a large variety of devices, not just connected cameras and displays but also sophisticated devices such as smart glasses, image/speech analyzers and gesture recognizers. IoMT enables the expression of the economic value of resources (media and metadata) and of associated processing in terms of digital tokens leveraged by the use of blockchain technologies.

Research aspects: The main focus of IoMT is APIs which provides easy and flexible access to the underlying device' functionality and, thus, are an important factor to enable research within this interesting domain. For example, using these APIs to enable communicates among these various media things could bring up new forms of interaction with these technologies.

MPEG finalizes its Media Orchestration (MORE) standard

MPEG "Media Orchestration" (MORE) standard reached Final Draft International Standard (FDIS), the final stage of development before being published by ISO/IEC. The scope of the Media Orchestration standard is as follows:
  • It supports the automated combination of multiple media sources (i.e., cameras, microphones) into a coherent multimedia experience.
  • It supports rendering multimedia experiences on multiple devices simultaneously, again giving a consistent and coherent experience.
  • It contains tools for orchestration in time (synchronization) and space.
MPEG expects that the Media Orchestration standard to be especially useful in immersive media settings. This applies notably in social virtual reality (VR) applications, where people share a VR experience and are able to communicate about it. Media Orchestration is expected to allow synchronizing the media experience for all users, and to give them a spatially consistent experience as it is important for a social VR user to be able to understand when other users are looking at them.

Research aspects: This standard enables the social multimedia experience proposed in literature. Interestingly, the W3C is working on something similar referred to as timing object and it would be interesting to see whether these approaches have some commonalities.

What else happened at the MPEG meeting?

DASH is fully in maintenance mode and we are still waiting for the 3rd edition which is supposed to be a consolidation of existing corrigenda and amendments. Currently only minor extensions are proposed and conformance/reference software is being updated. Similar things can be said for CMAF where we have one amendment and one corrigendum under development. Additionally, MPEG is working on CMAF conformance. OMAF has reached FDIS at the last meeting and MPEG is working on reference software and conformance also. It is expected that in the future we will see additional standards and/or technical reports defining/describing how to use CMAF and OMAF in DASH.

Regarding the future video codec, the call for proposals is out since the last meeting as announced in my previous report and responses are due for the next meeting. Thus, it is expected that the 122nd MPEG meeting will be the place to be in terms of MPEG’s future video codec. Speaking about the future, shortly after the 121st MPEG, Leonardo Chiariglione published a blog post entitled “a crisis, the causes and a solution”, which is related to HEVC licensing, Alliance for Open Media (AOM), and possible future options. The blog post certainly caused some reactions within the video community at large and I think this was also intended. Let’s hope it will galvanice the video industry -- not to push the button -- but to start addressing and resolving the issues. As pointed out in one of my other blog posts about what to care about in 2018, the upcoming MPEG meeting in April 2018 is certainly a place to be. Additionally, it highlights some conferences related to various aspects also discussed in MPEG which I'd like to republish here:
  • QoMEX -- Int'l Conf. on Quality of Multimedia Experience -- will be hosted in Sardinia, Italy from May 29-31, which is THE conference to be for QoE of multimedia applications and services. Submission deadline is January 15/22, 2018.
  • MMSys -- Multimedia Systems Conf. -- and specifically Packet Video, which will be on June 12 in Amsterdam, The Netherlands. Packet Video is THE adaptive streaming scientific event 2018. Submission deadline is March 1, 2018.
  • Additionally, you might be interested in ICME (July 23-27, 2018, San Diego, USA), ICIP (October 7-10, 2018, Athens, Greece; specifically in the context of video coding), and PCS (June 24-27, 2018, San Francisco, CA, USA; also in the context of video coding).
  • The DASH-IF academic track hosts special events at MMSys (Excellence in DASH Award) and ICME (DASH Grand Challenge).
  • MIPR -- 1st Int'l Conf. on Multimedia Information Processing and Retrieval -- will be in Miami, Florida, USA from April 10-12, 2018. It has a broad range of topics including networking for multimedia systems as well as systems and infrastructures.

Thursday, December 14, 2017

MPEG news: a report from the 120th meeting, Macau, China

MPEG Meeting Plenary
The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The MPEG press release comprises the following topics:
  • Point Cloud Compression – MPEG evaluates responses to call for proposal and kicks off its technical work 
  • The omnidirectional media format (OMAF) has reached its final milestone 
  • MPEG-G standards reach Committee Draft for compression and transport technologies of genomic data 
  • Beyond HEVC – The MPEG & VCEG call to set the next standard in video compression 
  • MPEG adds better support for mobile environment to MMT 
  • New standard completed for Internet Video Coding 
  • Evidence of new video transcoding technology using side streams 

Point Cloud Compression

At its 120th meeting, MPEG analysed the technologies submitted by nine industry leaders as responses to the Call for Proposals (CfP) for Point Cloud Compression (PCC). These technologies address the lossless or lossy coding of 3D point clouds with associated attributes such as colour and material properties. Point clouds are referred to as unordered sets of points in a 3D space and typically captured using various setups of multiple cameras, depth sensors, LiDAR scanners, etc., but can also be generated synthetically and are in use in several industries. They have recently emerged as representations of the real world enabling immersive forms of interaction, navigation, and communication. Point clouds are typically represented by extremely large amounts of data providing a significant barrier for mass market applications. Thus, MPEG has issued a Call for Proposal seeking technologies that allow reduction of point cloud data for its intended applications. After a formal objective and subjective evaluation campaign, MPEG selected three technologies as starting points for the test models for static, animated, and dynamically acquired point clouds. A key conclusion of the evaluation was that state-of-the-art point cloud compression can be significantly improved by leveraging decades of 2D video coding tools and combining 2D and 3D compression technologies. Such an approach provides synergies with existing hardware and software infrastructures for rapid deployment of new immersive experiences.
Although the initial selection of technologies for point cloud compression has been concluded at the 120th MPEG meeting, it could be also seen as a kick-off for its scientific evaluation and various further developments including the optimization thereof. It is expected that various scientific conference will focus on point cloud compression and may open calls for grand challenges like for example at IEEE ICME 2018.

Omnidirectional Media Format (OMAF)

The understanding of the virtual reality (VR) potential is growing but market fragmentation caused by the lack of interoperable formats for the storage and delivery of such content stifles VR’s market potential. MPEG’s recently started project referred to as Omnidirectional Media Format (OMAF) has reached Final Draft of International Standard (FDIS) at its 120th meeting. It includes
  • equirectangular projection and cubemap projection as projection formats; 
  • signalling of metadata required for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data; and 
  • provides a selection of audio-visual codecs for this application. 
It also includes technologies to arrange video pixel data in numerous ways to improve compression efficiency and reduce the size of video, a major bottleneck for VR applications and services, The standard also includes technologies for the delivery of OMAF content with MPEG-DASH and MMT.
MPEG has defined a format comprising a minimal set of tools to enable interoperability among implementers of the standard. Various aspects are deliberately excluded from the normative parts to foster innovation leading to novel products and services. This enables us -- researcher and practitioners -- to experiment with these new formats in various ways and focus on informative aspects where typically competition can be found. For example, efficient means for encoding and packaging of omnidirectional/360-degree media content and its adaptive streaming including support for (ultra-)low latency will become a big issue in the near future.

MPEG-G: Compression and Transport Technologies of Genomic Data

The availability of high throughput DNA sequencing technologies opens new perspectives in the treatment of several diseases making possible the introduction of new global approaches in public health known as “precision medicine”. While routine DNA sequencing in the doctor’s office is still not current practice, medical centers have begun to use sequencing to identify cancer and other diseases and to find effective treatments. As DNA sequencing technologies produce extremely large amounts of data and related information, the ICT costs of storage, transmission, and processing are also very high. The MPEG-G standard addresses and solves the problem of efficient and economical handling of genomic data by providing new
  • compression technologies (ISO/IEC 23092-2) and 
  • transport technologies (ISO/IEC 23092-1), 
which reached Committee Draft level at its 120th meeting.
Additionally, the Committee Drafts for
  • metadata and APIs (ISO/IEC 23092-3) and 
  • reference software (ISO/IEC 23092-4) 
are scheduled for the next MPEG meeting and the goal is to publish Draft International Standards (DIS) at the end of 2018.
This new type of (media) content, which requires compression and transport technologies, is emerging within the multimedia community at large and, thus, input is welcome.

Beyond HEVC – The MPEG & VCEG Call to set the Next Standard in Video Compression

The 120th MPEG meeting marked the first major step toward the next generation of video coding standard in the form of a joint Call for Proposals (CfP) with ITU-T SG16’s VCEG. After two years of collaborative informal exploration studies and a gathering of evidence that successfully concluded at the 118th MPEG meeting, MPEG and ITU-T SG16 agreed to issue the CfP for future video coding technology with compression capabilities that significantly exceed those of the HEVC standard and its current extensions. They also formalized an agreement on formation of a joint collaborative team called the “Joint Video Experts Team” (JVET) to work on development of the new planned standard, pending the outcome of the CfP that will be evaluated at the 122nd MPEG meeting in April 2018. To evaluate the proposed compression technologies, formal subjective tests will be performed using video material submitted by proponents in February 2018. The CfP includes the testing of technology for 360° omnidirectional video coding and the coding of content with high-dynamic range and wide colour gamut in addition to conventional standard-dynamic-range camera content. Anticipating a strong response to the call, a “test model” draft design is expected be selected in 2018, with development of a potential new standard in late 2020.
The major goal of a new video coding standard is to be better than its successor (HEVC). Typically this "better" is quantified by 50% which means, that it should be possible encode the video at the same quality with half of the bitrate or a significantly higher quality with the same bitrate including. However, at this time the “Joint Video Experts Team” (JVET) from MPEG and ITU-T SG16 faces competition from the Alliance for Open Media, which is working on AV1. In any case, we are looking forward to an exciting time frame from now until this new codec is ratified and how it will perform compared to AV1. Multimedia systems and applications will also benefit from new codecs which will gain traction as soon as first implementations of this new codec becomes available (note that AV1 is available as open source already and continuously further developed).

MPEG adds Better Support for Mobile Environment to MPEG Media Transport (MMT)

MPEG has approved the Final Draft Amendment (FDAM) to MPEG Media Transport (MMT; ISO/IEC 23008-1:2017), which is referred to as “MMT enhancements for mobile environments”. In order to reflect industry needs on MMT, which has been well adopted by broadcast standards such as ATSC 3.0 and Super Hi-Vision, it addresses several important issues on the efficient use of MMT in mobile environments. For example, it adds distributed resource identification message to facilitate multipath delivery and transition request message to change the delivery path of an active session. This amendment also introduces the concept of a MMT-aware network entity (MANE), which might be placed between the original server and the client, and provides a detailed description about how to use it for both improving efficiency and reducing delay of delivery. Additionally, this amendment provides a method to use WebSockets to setup and control an MMT session/presentation.

New Standard Completed for Internet Video Coding

A new standard for video coding suitable for the internet as well as other video applications, was completed at the 120th MPEG meeting. The Internet Video Coding (IVC) standard was developed with the intention of providing the industry with an “Option 1” video coding standard. In ISO/IEC language, this refers to a standard for which patent holders have declared a willingness to grant licenses free of charge to an unrestricted number of applicants for all necessary patents on a worldwide, non-discriminatory basis and under other reasonable terms and conditions, to enable others to make, use, and sell implementations of the standard. At the time of completion of the IVC standard, the specification contained no identified necessary patent rights except those available under Option 1 licensing terms. During the development of IVC, MPEG removed from the draft standard any necessary patent rights that it was informed were not available under such Option 1 terms, and MPEG is optimistic of the outlook for the new standard. MPEG encourages interested parties to provide information about any other similar cases. The IVC standard has roughly similar compression capability as the earlier AVC standard, which has become the most widely deployed video coding technology in the world. Tests have been conducted to verify IVC’s strong technical capability, and the new standard has also been shown to have relatively modest implementation complexity requirements.

Evidence of new Video Transcoding Technology using Side Streams

Following a “Call for Evidence” (CfE) issued by MPEG in July 2017, evidence was evaluated at the 120th MPEG meeting to investigate whether video transcoding technology has been developed for transcoding assisted by side data streams that is capable of significantly reducing the computational complexity without reducing compression efficiency. The evaluations of the four responses received included comparisons of the technology against adaptive bit-rate streaming using simulcast as well as against traditional transcoding using full video re-encoding. The responses span the compression efficiency space between simulcast and full transcoding, with trade-offs between the bit rate required for distribution within the network and the bit rate required for delivery to the user. All four responses provided a substantial computational complexity reduction compared to transcoding using full re-encoding. MPEG plans to further investigate transcoding technology and is soliciting expressions of interest from industry on the need for standardization of such assisted transcoding using side data streams.

MPEG currently works on two related topics which are referred to as network-distributed video coding (NDVC) and network-based media processing (NBMP). Both activities involve the network, which is more and more evolving to highly distributed compute and delivery platform as opposed to a bit pipe, which is supposed to deliver data as fast as possible from A to B. This phenomena could be also interesting when looking at developments around 5G, which is actually much more than just radio access technology. These activities are certainly worth to monitor as it basically contributes in order to make networked media resources accessible or even programmable. In this context, I would like to refer the interested reader to the December'17 theme of the IEEE Computer Society Computing Now, which is about Advancing Multimedia Content Distribution.
Publicly available documents from the 120th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Gwangju, Korea, January 22-26, 2018. Feel free to contact Christian Timmerer for any questions or comments.
Some of the activities reported above are considered within the Call for Papers at 23rd Packet Video Workshop (PV 2018) co-located with ACM MMSys 2018 in Amsterdam, The Netherlands. Topics of interest include (but are not limited to):
  • Adaptive media streaming, and content storage, distribution and delivery 
  • Network-distributed video coding and network-based media processing 
  • Next-generation/future video coding, point cloud compression 
  • Audiovisual communication, surveillance and healthcare systems 
  • Wireless, mobile, IoT, and embedded systems for multimedia applications 
  • Future media internetworking: information-centric networking and 5G 
  • Immersive media: virtual reality (VR), augmented reality (AR), 360° video and multi-sensory systems, and its streaming 
  • Machine learning in media coding and streaming systems 
  • Standardization: DASH, MMT, CMAF, OMAF, MiAF, WebRTC, MSE, EME, WebVR, Hybrid Media, WAVE, etc.
  • Applications: social media, game streaming, personal broadcast, healthcare, industry 4.0, education, transportation, etc. 
Important dates
  • Submission deadline: March 1, 2018 
  • Acceptance notification: April 9, 2018 
  • Camera-ready deadline: April 19, 2018