Showing posts with label isobmff. Show all posts
Showing posts with label isobmff. Show all posts

Wednesday, March 31, 2021

MPEG news: a report from the 133rd meeting (virtual)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.

MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award

The 133rd MPEG meeting was once again held as an online meeting, and this time, kicked off with great news, that MPEG is one of the organizations honored as a 72nd Annual Technology & Engineering Emmy® Awards Recipient, specifically the MPEG Systems File Format Subgroup and its ISO Base Media File Format (ISOBMFF) et al.

The official press release can be found here and comprises the following items:
  • 6th Emmy® Award for MPEG Technology: MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award
  • Essential Video Coding (EVC) verification test finalized
  • MPEG issues a Call for Evidence on Video Coding for Machines
  • Neural Network Compression for Multimedia Applications – MPEG calls for technologies for incremental coding of neural networks
  • MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)
  • MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)
  • MPEG Systems reached the first milestone to carry event messages in tracks of the ISO Base Media File Format
In this report, I’d like to focus on ISOBMFF, EVC, CMAF, and DASH.

MPEG Systems File Format Subgroup wins Technology & Engineering Emmy® Award

MPEG is pleased to report that the File Format subgroup of MPEG Systems is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with a Technology & Engineering Emmy® for their 20 years of work on the ISO Base Media File Format (ISOBMFF). This format was first standardized in 1999 as part of the MPEG-4 Systems specification and is now in its 6th edition as ISO/IEC 14496-12. It has been used and adopted by many other specifications, e.g.:
  • MP4 and 3GP file formats;
  • Carriage of NAL unit structured video in the ISO Base Media File Format, which provides support for AVC, HEVC, VVC, EVC, and probably soon LCEVC;
  • MPEG-21 file format;
  • Dynamic Adaptive Streaming over HTTP (DASH) and Common Media Application Format (CMAF);
  • High-Efficiency Image Format (HEIF);
  • Timed text and other visual overlays in ISOBMFF;
  • Common encryption format;
  • Carriage of timed metadata metrics of media;
  • Derived visual tracks;
  • Event message track format;
  • Carriage of uncompressed video;
  • Omnidirectional Media Format (OMAF);
  • Carriage of visual volumetric video-based coding data;
  • Carriage of geometry-based point cloud compression data;
  • … to be continued!
This is MPEG’s fourth Technology & Engineering Emmy® Award (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, and MPEG-2 Transport Stream in 2013) and sixth overall Emmy® Award, including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017, respectively.

Essential Video Coding (EVC) verification test finalized

At the 133rd MPEG meeting, a verification testing assessment of the Essential Video Coding (EVC) standard was completed. The first part of the EVC verification test using high dynamic range (HDR) and wide color gamut (WCG) was completed at the 132nd MPEG meeting. A subjective quality evaluation was conducted comparing the EVC Main profile to the HEVC Main 10 profile and the EVC Baseline profile to AVC High 10 profile, respectively:
  • Analysis of the subjective test results showed that the average bitrate savings for EVC Main profile are approximately 40% compared to HEVC Main 10 profile, using UHD and HD SDR content encoded in both random access and low delay configurations.
  • The average bitrate savings for the EVC Baseline profile compared to the AVC High 10 profile is approximately 40% using UHD SDR content encoded in the random-access configuration and approximately 35% using HD SDR content encoded in the low delay configuration.
  • Verification test results using HDR content had shown average bitrate savings for EVC Main profile of approximately 35% compared to HEVC Main 10 profile.
By providing significantly improved compression efficiency compared to HEVC and earlier video coding standards while encouraging the timely publication of licensing terms, the MPEG-5 EVC standard is expected to meet the market needs of emerging delivery protocols and networks, such as 5G, enabling the delivery of high-quality video services to an ever-growing audience.

In addition to verification tests, EVC, along with VVC and CMAF were subject to further improvements to their support systems.

Research aspects: as for every new video codec, its compression efficiency and computational complexity are important performance metrics. Additionally, the availability of (efficient) open-source implementations (i.e., x264, x265, soon x266, VVenC, aomenc, et al., etc.) are vital for its adoption in the (academic) research community.

MPEG Systems reaches the first milestone for supporting Versatile Video Coding (VVC) and Essential Video Coding (EVC) in the Common Media Application Format (CMAF)

At the 133rd MPEG meeting, MPEG Systems promoted Amendment 2 of the Common Media Application Format (CMAF) to Committee Draft Amendment (CDAM) status, the first major milestone in the ISO/IEC approval process. This amendment defines:
  • constraints to (i) Versatile Video Coding (VVC) and (ii) Essential Video Coding (EVC) video elementary streams when carried in a CMAF video track;
  • codec parameters to be used for CMAF switching sets with VVC and EVC tracks; and
  • support of the newly introduced MPEG-H 3D Audio profile.
It is expected to reach its final milestone in early 2022. For research aspects related to CMAF, the reader is referred to the next section about DASH.

MPEG Systems continuously enhances Dynamic Adaptive Streaming over HTTP (DASH)

At the 133rd MPEG meeting, MPEG Systems promoted Part 8 of Dynamic Adaptive Streaming over HTTP (DASH), also referred to as “Session-based DASH,” to its final stage of standardization (i.e., Final Draft International Standard (FDIS)).

Historically, in DASH, every client uses the same Media Presentation Description (MPD), as it best serves the service's scalability. However, there have been increasing requests from the industry to enable customized manifests for enabling personalized services. MPEG Systems has standardized a solution to this problem without sacrificing scalability. Session-based DASH adds a mechanism to the MPD to refer to another document, called Session-based Description (SBD), allowing per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests.

An updated overview of DASH standards/features can be found in the Figure below.
MPEG DASH Status as of January 2021.

Research aspects: CMAF is mostly like becoming the main segment format to be used in the context of HTTP adaptive streaming (HAS) and, thus, also DASH (hence also the name common media application format). Supporting a plethora of media coding formats will inevitably result in a multi-codec dilemma that needs to be addressed soon as there will be no flag day where everyone will switch to a new coding format. Thus, designing efficient bitrate ladders for multi-codec delivery will an interesting research aspect, which needs to include device/player support (i.e., some devices/player will support only a subset of available codecs), storage capacity/costs within the cloud as well as within the delivery network, and network distribution capacity/costs (i.e., CDN costs).

The 134th MPEG meeting will be again an online meeting in April 2021. Click here for more information about MPEG meetings and their developments.

Wednesday, May 27, 2020

MPEG news: a report from the 130th meeting, Alpbach, Austria (virtual)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The 130th MPEG meeting concluded on April 24, 2020, in Alpbach, Austria ... well, not exactly, unfortunately. The 130th MPEG meeting concluded on April 24, 2020, but not in Alpbach, Austria.

I attended the 130th MPEG meeting remotely.
Because of the Covid-19 pandemic, the 130th MPEG meeting has been converted from a physical meeting to a fully online meeting, the first in MPEG’s 30+ years of history. Approximately 600 experts attending from 19 time zones worked in tens of Zoom meeting sessions supported by an online calendar and by collaborative tools that involved MPEG experts in both online and offline sessions. For example, input contributions had to be registered and uploaded ahead of the meeting to allow for efficient scheduling of two-hour meeting slots, which have been distributed from early morning to late night in order to accommodate experts working in different time zones as mentioned earlier. These input contributions have been then mapped to GitLab issues for offline discussions and the actual meeting slots have been primarily used for organizing the meeting, resolving conflicts, and making decisions including approving output documents. Although the productivity of the online meeting could not reach the level of regular face-to-face meetings, the results posted in the press release show that MPEG experts managed the challenge quite well, specifically
  • MPEG ratifies MPEG-5 Essential Video Coding (EVC) standard;
  • MPEG issues the Final Draft International Standards for parts 1, 2, 4, and 5 of MPEG-G 2nd edition;
  • MPEG expands the coverage of ISO Base Media File Format (ISOBMFF) family of standards;
  • A new standard for large scale client-specific streaming with MPEG-DASH;
Other Important Activities at the 130th MPEG meeting: (i) the carriage of visual volumetric video-based coding data, (ii) Network-Based Media Processing (NBMP) function templates, (iii) the conversion from MPEG-21 contracts to smart contracts, (iv) deep neural network based video coding, (v) Low Complexity Enhancement Video Coding (LCEVC) reaching DIS stage, and (vi) a new level of the MPEG-4 Audio ALS Simple Profile for high-resolution audio among others

The corresponding press release of the 130th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/130. This report focused on video coding (EVC) and systems aspects (file format, DASH).

MPEG ratifies MPEG-5 Essential Video Coding Standard

At its 130th meeting, MPEG announced the completion of the new ISO/IEC 23094-1 standard which is referred to as MPEG-5 Essential Video Coding (EVC) and has been promoted to Final Draft International Standard (FDIS) status. There is a constant demand for more efficient video coding technologies (e.g., due to the increased usage of video on the internet), but coding efficiency is not the only factor determining the industry's choice of video coding technology for products and services. The EVC standard offers improved compression efficiency compared to existing video coding standards and is based on the statements of all contributors to the standard who have committed announcing their license terms for the MPEG-5 EVC standard no later than two years after the FDIS publication date.

The MPEG-5 EVC defines two important profiles, including "Baseline profile" and "Main profile". The "Baseline profile" contains only technologies that are older than 20 years or otherwise freely available for use in the standard. In addition, the "Main profile" adds a small number of additional tools, each of which can be either cleanly disabled or switched to the corresponding baseline tool on an individual basis.

It will be interesting to see how EVC profiles (baseline and main) will find its path into products and services given the existing number of codecs already in use (e.g., AVC, HEVC, VP9, AV1) and those still under development but being close to ratification (e.g., VVC, LCEVC). That is, in total we, may end up with about seven video coding formats that probably need to be considered for future video products and services. In other words, the multi-codec scenario I have envisioned some time ago is becoming reality raising some interesting challenges to be addressed in the future.

Research aspects: as for all video coding standards, the most important research aspect is certainly coding efficiency. For EVC it might be also interesting to research its usability of the built-in tool switching mechanism within a practical setup. Furthermore, the multi-codec issue, the ratification of EVC adds another facet to the already existing video coding standards in use or/and under development.

MPEG expands the Coverage of ISO Base Media File Format (ISOBMFF) Family of Standards

At the 130th WG11 (MPEG) meeting, the ISOBMFF family of standards has been significantly amended with new tools and functionalities. The standards in question are as follows:
  • ISO/IEC 14496-12: ISO Base Media File Format;
  • ISO/IEC 14496-15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format;
  • ISO/IEC 23008-12: Image File Format; and
  • ISO /IEC 23001-16: Derived visual tracks in the ISO base media file format.
In particular, three new amendments to the ISOBMFF family have reached their final milestone, i.e., Final Draft Amendment (FDAM):
  1. Amendment 4 to ISO/IEC 14496-12 (ISO Base Media File Format) allows the use of a more compact version of metadata for movie fragments;
  2. Amendment 1 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) adds support of HEVC slice segment data track and additional extractor types for HEVC such as track reference and track groups; and
  3. Amendment 2 to ISO/IEC 23008-12 (Image File Format) adds support for more advanced features related to the storage of short image sequences such as burst and bracketing shots.
At the same time, new amendments have reached their first milestone, i.e., Committee Draft Amendment (CDAM):
  1. Amendment 2 to ISO/IEC 14496-15 (Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format) extends its scope to newly developed video coding standards such as Essential Video Coding (EVC) and Versatile Video Coding (VVC); and
  2. the first edition of ISO/IEC 23001-16 (Derived visual tracks in the ISO base media file format) allows a new type of visual track whose content can be dynamically generated at the time of presentation by applying some operations to the content in other tracks, such as crossfading over two tracks.
Both are expected to reach their final milestone in mid-2021.

Finally, the final text for the ISO/IEC 14496-12 6th edition Final Draft International Standard (FDIS) is now ready for the ballot after converting MP4RA to the Maintenance Agency. WG11 (MPEG) notes that Apple Inc. has been appointed as the Maintenance Agency and MPEG appreciates its valuable efforts for the many years while already acting as the official registration authority for the ISOBMFF family of standards, i.e., MP4RA (https://mp4ra.org/). The 6th edition of ISO/IEC 14496-12 is expected to be published by ISO by the end of this year.

Research aspects: the ISOBMFF family of standards basically offers certain tools and functionalities to satisfy the given use case requirements. The task of the multimedia systems research community could be to scientifically validate these tools and functionalities with respect to the use cases and maybe even beyond, e.g., try to adopt these tools and functionalities for novel applications and services.

A New Standard for Large Scale Client-specific Streaming with DASH

Historically, in ISO/IEC 23009 (Dynamic Adaptive Streaming over HTTP; DASH), every client has used the same Media Presentation Description (MPD) as it best serves the scalability of the service (e.g., for efficient cache efficiency in content delivery networks). However, there have been increasing requests from the industry to enable customized manifests for more personalized services. Consequently, MPEG has studied a solution to this problem without sacrificing scalability, and it has reached the first milestone of its standardization at the 130th MPEG meeting.

ISO/IEC 23009-8 adds a mechanism to the Media Presentation Description (MPD) to refer to another document, called Session-based Description (SBD), which allows per-session information. The DASH client can use this information (i.e., variables and their values) provided in the SBD to derive the URLs for HTTP GET requests. This standard is expected to reach its final milestone in mid-2021.

Research aspects: SBD's goal is to enable personalization while maintaining scalability which calls for a tradeoff, i.e., which kind of information to put into the MPD and what should be conveyed within the SBD. This tradeoff per se could be considered already a research question that will be hopefully addressed in the near future.

An overview of the current status of MPEG-DASH can be found in the figure below.
The next MPEG meeting will be from June 29th to July 3rd and will be again an online meeting. I am looking forward to a productive AhG period and an online meeting later this year. I am sure that MPEG will further improve its online meeting capabilities and can certainly become a role model for other groups within ISO/IEC and probably also beyond.

Wednesday, August 21, 2019

MPEG news: a report from the 127th meeting, Gothenburg, Sweden

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

MPEG News Archive

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.
The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:
  • Versatile Video Coding (VVC) enters formal approval stage, experts predict 35-60% improvement over HEVC
  • Essential Video Coding (EVC) promoted to Committee Draft
  • Common Media Application Format (CMAF) 2nd edition promoted to Final Draft International Standard
  • Dynamic Adaptive Streaming over HTTP (DASH) 4th edition promoted to Final Draft International Standard
  • Carriage of Point Cloud Data Progresses to Committee Draft
  • JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition
  • Genomic information representation – WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • ISO/IEC 23005 (MPEG-V) 4th Edition – WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

The corresponding press release of the 127th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/127

Versatile Video Coding (VVC)

The Moving Picture Experts Group (MPEG) is pleased to announce that Versatile Video Coding (VVC) progresses to Committee Draft, experts predict 35-60% improvement over HEVC.

The development of the next major generation of video coding standard has achieved excellent progress, such that MPEG has approved the Committee Draft (CD, i.e., the text for formal balloting in the ISO/IEC approval process).

The new VVC standard will be applicable to a very broad range of applications and it will also provide additional functionalities. VVC will provide a substantial improvement in coding efficiency relative to existing standards. The improvement in coding efficiency is expected to be quite substantial – e.g., in the range of 35–60% bit rate reduction relative to HEVC although it has not yet been formally measured. Relative to HEVC means for equivalent subjective video quality at picture resolutions such as 1080p HD or 4K or 8K UHD, either for standard dynamic range video or high dynamic range and wide color gamut content for levels of quality appropriate for use in consumer distribution services. The focus during the development of the standard has primarily been on 10-bit 4:2:0 content, and 4:4:4 chroma format will also be supported.

The VVC standard is being developed in the Joint Video Experts Team (JVET), a group established jointly by MPEG and the Video Coding Experts Group (VCEG) of ITU-T Study Group 16. In addition to a text specification, the project also includes the development of reference software, a conformance testing suite, and a new standard ISO/IEC 23002-7 specifying supplemental enhancement information messages for coded video bitstreams. The approval process for ISO/IEC 23002-7 has also begun, with the issuance of a CD consideration ballot.

Research aspects: VVC represents the next generation video codec to be deployed in 2020+ and basically the same research aspects apply as for previous generations, i.e., coding efficiency, performance/complexity, and objective/subjective evaluation. Luckily, JVET documents are freely available including the actual standard (committee draft), software (and its description), and common test conditions. Thus, researcher utilizing these resources are able to conduct reproducible research when contributing their findings and code improvements back to the community at large. 

Essential Video Coding (EVC)

MPEG-5 Essential Video Coding (EVC) promoted to Committee Draft

Interestingly, at the same meeting as VVC, MPEG promoted MPEG-5 Essential Video Coding (EVC) to Committee Draft (CD). The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.

The MPEG-5 EVC standards includes a baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be royalty-free. Additionally, a main profile adds a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool. Organizations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

Research aspects: Similar research aspects can be described for EVC and from a software engineering perspective it could be also interesting to further investigate this switching mechanism of individual tools or/and fall back option to baseline tools. Naturally, a comparison with next generation codecs such as VVC is interesting per se. The licensing aspects itself are probably interesting for other disciplines but that is another story...

Common Media Application Format (CMAF)

MPEG ratified the 2nd edition of the Common Media Application Format (CMAF)

The Common Media Application Format (CMAF) enables efficient encoding, storage, and delivery of digital media content (incl. audio, video, subtitles among others), which is key to scaling operations to support the rapid growth of video streaming over the internet. The CMAF standard is the result of widespread industry adoption of an application of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices within CMAF.

The 2nd edition of CMAF adds support for a number of specifications that were a result of significant industry interest. Those include
  • Advanced Audio Coding (AAC) multi-channel;
  • MPEG-H 3D Audio;
  • MPEG-D Unified Speech and Audio Coding (USAC);
  • Scalable High Efficiency Video Coding (SHVC);
  • IMSC 1.1 (Timed Text Markup Language Profiles for Internet Media Subtitles and Captions); and
  • additional HEVC video CMAF profiles and brands.
This edition also introduces CMAF supplemental data handling as well as new structural brands for CMAF that reflects the common practice of the significant deployment of CMAF in industry. Companies adopting CMAF technology will find the specifications introduced in the 2nd Edition particularly useful for further adoption and proliferation of CMAF in the market.

Research aspects: see below (DASH).

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG approves the 4th edition of Dynamic Adaptive Streaming over HTTP (DASH)

The 4th edition of MPEG-DASH comprises the following features:
service description that is intended by the service provider on how the service is expected to be consumed;
  • a method to indicate the times corresponding to the production of associated media;
  • a mechanism to signal DASH profiles and features, employed codec and format profiles; and
  • supported protection schemes present in the Media Presentation Description (MPD).
It is expected that this edition will be published later this year. 

Research aspects: CMAF 2nd and DASH 4th edition come along with a rich feature set enabling a plethora of use cases. The underlying principles are still the same and research issues arise from updated application and service requirements with respect to content complexity, time aspects (mainly delay/latency), and quality of experience (QoE). The DASH-IF awards the excellence in DASH award at the ACM Multimedia Systems conference and an overview about its academic efforts can be found here. For example, see here our recent research on bandwidth prediction in low-latency chunked streaming. Additionally, our tutorial at ACM Multimedia 2019 about a journey towards fully immersive media access reviews state of the art in this area and how it could be extended enabling 6DoF HAS services through point cloud compression.

Carriage of Point Cloud Data

MPEG progresses the Carriage of Point Cloud Data to Committee Draft

At its 127th meeting, MPEG has promoted the carriage of point cloud data to the Committee Draft stage, the first milestone of ISO standard development process. This standard is the first one introducing the support of volumetric media in the industry-famous ISO base media file format family of standards.

This standard supports the carriage of point cloud data comprising individually encoded video bitstreams within multiple file format tracks in order to support the intrinsic nature of the video-based point cloud compression (V-PCC). Additionally, it also allows the carriage of point cloud data in one file format track for applications requiring multiplexed content (i.e., the video bitstream of multiple components is interleaved into one bitstream).

This standard is expected to support efficient access and delivery of some portions of a point cloud object considering that in many cases that entire point cloud object may not be visible by the user depending on the viewing direction or location of the point cloud object relative to other objects. It is currently expected that the standard will reach its final milestone by the end of 2020.

Research aspects: MPEG's Point Cloud Compression (PCC) comes in two flavors, video- and geometric-based but still requires to be packaged into file and delivery formats. MPEG's choice here is the ISO base media file format and the efficient carriage of point cloud data is characterized by both functionality (i.e., enabling the required used cases) and performance (such as low overhead).

MPEG 2 Systems/Transport Stream

JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition

At its 127th meeting, WG11 (MPEG) has extended ISO/IEC 13818-1 (MPEG-2 Systems) – in collaboration with WG1 (JPEG) – to support ISO/IEC 21122 (JPEG XS) in order to support industries using still image compression technologies for broadcasting infrastructures. The specification defines a JPEG XS elementary stream header and specifies how the JPEG XS video access unit (specified in ISO/IEC 21122-1) is put into a Packetized Elementary Stream (PES). Additionally, the specification also defines how the System Target Decoder (STD) model can be extended to support JPEG XS video elementary streams.

Genomic information representation

WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that have to be stored, transferred and analyzed. So far WG 11 (MPEG) and ISO TC 276/WG 5 have addressed the representation, compression and transport of genome sequencing data by developing the ISO/IEC 23092 standard series also known as MPEG-G. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of sequencing data in the native compressed format.

An important element in the effective usage of sequencing data is the association of the data with the results of the analysis and annotations that are generated by processing pipelines and analysts. At the moment such association happens as a separate step, standard and effective ways of linking data and meta information derived from sequencing data are not available.

At its 127th meeting, MPEG and ISO TC 276/WG 5 issued a joint Call for Proposals (CfP) addressing the solution of such problem. The call seeks submissions of technologies that can provide efficient representation and compression solutions for the processing of genomic annotation data.

Companies and organizations are invited to submit proposals in response to this call. Responses are expected to be submitted by the 8th January 2020 and will be evaluated during the 129th WG 11 (MPEG) meeting. Detailed information, including how to respond to the call for proposals, the requirements that have to be considered, and the test data to be used, is reported in the documents N18648, N18647, and N18649 available at the 127th meeting website (http://mpeg.chiariglione.org/meetings/127). For any further question about the call, test conditions, required software or test sequences please contact: Joern Ostermann, MPEG Requirements Group Chair (ostermann@tnt.uni-hannover.de) or Martin Golebiewski, Convenor ISO TC 276/WG 5 (martin.golebiewski@h-its.org).

ISO/IEC 23005 (MPEG-V) 4th Edition

WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

At its 127th meeting, WG11 (MPEG) promoted the 4th edition of two parts of ISO/IEC 23005 (MPEG-V; Media Context and Control) standards to the Final Draft International Standard (FDIS). The new edition of ISO/IEC 23005-1 (architecture) enables ten new use cases, which can be grouped into four categories: 3D printing, olfactory information in virtual worlds, virtual panoramic vision in car, and adaptive sound handling. The new edition of ISO/IEC 23005-7 (conformance and reference software) is updated to reflect the changes made by the introduction of new tools defined in other parts of ISO/IEC 23005. More information on MPEG-V and its parts 1-7 can be found at https://mpeg.chiariglione.org/standards/mpeg-v.

Finally, the unofficial highlight of the 127th MPEG meeting we certainly found while scanning the scene in Gothenburg on Tuesday night...





Monday, February 18, 2019

MPEG news: a report from the 125th meeting, Marrakesh, Morocco

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:
  • Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
  • 3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
  • MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
  • ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
  • MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description
The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.

Wednesday, August 15, 2018

MPEG news: a report from the 123rd meeting, Ljubljana, Slovenia

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
The MPEG press release comprises the following topics:

  • MPEG issues Call for Evidence on Compressed Representation of Neural Networks
  • Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
  • MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
  • MPEG releases software for MPEG-I visual activities
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.
At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: As this is about "compression" of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).
NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.
After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.
Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.
Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.
Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard's participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.

What else happened at #MPEG123?

  • The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
  • There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.
  • MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
  • Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
  • JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: http://phenix.it-sudparis.eu/jvet/. An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
  • In addition to VVC -- which is a joint activity with VCEG --, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
  • Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.
In this context it is probably worth mentioning the following DASH awards at recent conferences
Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Thursday, May 10, 2018

MPEG news: a report from the 122nd meeting, San Diego, CA, USA

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.


The MPEG press release comprises the following topics:
  • Versatile Video Coding (VVC) project starts strongly in the Joint Video Experts Team
  • MPEG issues Call for Proposals on Network-based Media Processing
  • MPEG finalizes 7th edition of MPEG-2 Systems Standard
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with two new features
  • MPEG-G standards reach Draft International Standard for transport and compression technologies

Versatile Video Coding (VVC) – MPEG’ & VCEG’s new video coding project starts strong

The Joint Video Experts Team (JVET), a collaborative team formed by MPEG and ITU-T Study Group 16’s VCEG, commenced work on a new video coding standard referred to as Versatile Video Coding (VVC). The goal of VVC is to provide significant improvements in compression performance over the existing HEVC standard (i.e., typically twice as much as before) and to be completed in 2020. The main target applications and services include — but not limited to — 360-degree and high-dynamic-range (HDR) videos. In total, JVET evaluated responses from 32 organizations using formal subjective tests conducted by independent test labs. Interestingly, some proposals demonstrated compression efficiency gains of typically 40% or more when compared to using HEVC. Particular effectiveness was shown on ultra-high definition (UHD) video test material. Thus, we may expect compression efficiency gains well-beyond the targeted 50% for the final standard.

Research aspects: Compression tools and everything around it including its objective and subjective assessment. The main application area is clearly 360-degree and HDR. Watch out conferences like PCS and ICIP (later this year), which will be full of papers making references to VVC. Interestingly, VVC comes with a first draft, a test model for simulation experiments, and a technology benchmark set which is useful and important for any developments for both inside and outside MPEG as it allows for reproducibility.

MPEG issues Call for Proposals on Network-based Media Processing

This Call for Proposals (CfP) addresses advanced media processing technologies such as network stitching for VR service, super resolution for enhanced visual quality, transcoding, and viewport extraction for 360-degree video within the network environment that allows service providers and end users to describe media processing operations that are to be performed by the network. Therefore, the aim of network-based media processing (NBMP) is to allow end user devices to offload certain kinds of processing to the network. Therefore, NBMP describes the composition of network-based media processing services based on a set of media processing functions and makes them accessible through Application Programming Interfaces (APIs). Responses to the NBMP CfP will be evaluated on the weekend prior to the 123rd MPEG meeting in July 2018.

Research aspects: This project reminds me a lot about what has been done in the past in MPEG-21, specifically Digital Item Adaptation (DIA) and Digital Item Processing (DIP). The main difference is that MPEG targets APIs rather than pure metadata formats, which is a step forward into the right direction as APIs can be implemented and used right away. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

7th edition of MPEG-2 Systems Standard and ISO Base Media File Format (ISOBMFF) with two new features

More than 20 years since its inception development of MPEG-2 systems technology (i.e., transport/program stream) continues. New features include support for: (i) JPEG 2000 video with 4K resolution and ultra-low latency, (ii) media orchestration related metadata, (iii) sample variance, and (iv) HEVC tiles.

The partial file format enables the description of an ISOBMFF file partially received over lossy communication channels. This format provides tools to describe reception data, the received data and document transmission information such as received or lost byte ranges and whether the corrupted/lost bytes are present in the file and repair information such as location of the source file, possible byte offsets in that source, byte stream position at which a parser can try processing a corrupted file. Depending on the communication channel, this information may be setup by the receiver or through out-of-band means.

ISOBMFF's sample variants (2nd edition), which are typically used to provide forensic information in the rendered sample data that can, for example, identify the specific Digital Rights Management (DRM) client which has decrypted the content. This variant framework is intended to be fully compatible with MPEG’s Common Encryption (CENC) and agnostic to the particular forensic marking system used.

Research aspects: MPEG systems standards are mainly relevant for multimedia systems research with all its characteristics. The partial file format is specifically interesting as it targets scenarios with lossy communication channels.

MPEG-G standards reach Draft International Standard for transport and compression technologies

MPEG-G provides a set of standards enabling interoperability for applications and services dealing with high-throughput deoxyribonucleic acid (DNA) sequencing. At its 122nd meeting, MPEG promoted its core set of MPEG-G specifications, i.e., transport and compression technologies, to Draft International Standard (DIS) stage. Such parts of the standard provide new transport technologies (ISO/IEC 23092-1) and compression technologies (ISO/IEC 23092-2) supporting rich functionality for the access and transport including streaming of genomic data by interoperable applications. Reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-5) will reach this stage in the next 12 months.

Research aspects: the main focus of this work item is compression and transport is still in its infancy. Therefore, research on the actual delivery for compressed DNA information as well as its processing is solicited.

What else happened at MPEG122?

  • Requirements is exploring new video coding tools dealing with low-complexity and process enhancements.
  • The activity around coded representation of neural networks has defined a set of vital use cases and is now calling for test data to be solicited until the next meeting.
  • The MP4 registration authority (MP4RA) has a new awesome web site http://mp4ra.org/.
  • MPEG-DASH is finally approving and working the 3rd edition comprising consolidated version of recent amendments and corrigenda.
  • CMAF started an exploration on multi-stream support, which could be relevant for tiled streaming and multi-channel audio.
  • OMAF kicked-off its activity towards a 2nd edition enabling support for 3DoF+ and social VR with the plan going to committee draft (CD) in Oct’18. Additionally, there’s a test framework proposed, which allows to assess performance of various OMAF tools. Its main focus is on video but MPEG’s audio subgroup has a similar framework to enable subjective testing. It could be interesting seeing these two frameworks combined in one way or the other.
  • MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively.
  • Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.
Research aspects: In this section I would like to focus on DASH, CMAF, and OMAF. Multi-stream support, as mentioned above, is relevant for tiled streaming and multi-channel audio which has been recently studied in the literature and is also highly relevant for industry. The efficient storage and streaming of such kind of content within the file format is an important aspect and often underrepresented in both research and standardization. The goal here is to keep the overhead low while maximizing the utility of the format to enable certain functionalities. OMAF now targets the social VR use case, which has been discussed in the research literature for a while and, finally, makes its way into standardization. An important aspect here is both user and quality of experience, which requires intensive subjective testing.

Finally, on May 10 MPEG will celebrate 30 years as its first meeting dates back to 1988 in Ottawa, Canada with around 30 attendees. The 122nd meeting had more than 500 attendees and MPEG has around 20 active work items. A total of more than 170 standards have been produces (that’s approx. six standards per year) where some standards have up to nine editions like the HEVC standards. Overall, MPEG is responsible for more that 23% of all JTC 1 standards and some of them showing extraordinary longevity regarding extensions, e.g., MPEG-2 systems (24 years), MPEG-4 file format (19 years), and AVC (15 years). MPEG standards serve billions of users (e.g., MPEG-1 video, MP2, MP3, AAC, MPEG-2, AVC, ISOBMFF, DASH). Some — more precisely five — standards have receive Emmy awards in the past (MPEG-1, MPEG-2, AVC (2x), and HEVC).
Tag cloud generated from all existing MPEG press releases.
Thus, happy birthday MPEG! In today’s society starts the high performance era with 30 years, basically the time of “compression”, i.e., we apply all what we learnt and live out everything, truly optimistic perspective for our generation X (millennials) standards body!