Tuesday, October 30, 2018

MPEG news: a report from the 124th meeting, Macau, China

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The MPEG press release comprises the following aspects:
  • Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage
  • Compressed Representation of Neural Networks - MPEG issues Call for Proposals
  • Low Complexity Video Coding Enhancements - MPEG issues Call for Proposals
  • New Video Coding Standard expected to have licensing terms timely available - MPEG issues Call for Proposals
  • Multi-Image Application Format (MIAF) promoted to Final Draft International Standard
  • 3DoF+ Draft Call for Proposal goes Public

Point Cloud Compression – MPEG promotes a video-based point cloud compression technology to the Committee Draft stage

At its 124th meeting, MPEG promoted its Video-based Point Cloud Compression (V-PCC) standard to Committee Draft (CD) stage. V-PCC addresses lossless and lossy coding of 3D point clouds with associated attributes such as colour. By leveraging existing and video ecosystems in general (hardware acceleration, transmission services and infrastructure), and future video codecs as well, the V-PCC technology enables new applications. The current V-PCC encoder implementation provides a compression of 125:1, which means that a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality.

A next step is the storage of V-PCC in ISOBMFF for which a working draft has been produced. It is expected that further details will be discussed in upcoming reports.
Research aspects: Video-based Point Cloud Compression (V-PCC) is at CD stage and a first working draft for the storage of V-PCC in ISOBMFF has been provided. Thus, a next consequence is the delivery of V-PCC encapsulated in ISOBMFF over networks utilizing various approaches, protocols, and tools. Additionally, one may think of using also different encapsulation formats if needed. I hope to see some of these aspects covered in future conferences including those -- but not limited to -- listed at the very end of this blog post.

MPEG issues Call for Proposals on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. Some applications require the deployment of a particular trained network instance to a potentially large number of devices and, thus, could benefit from a standard for the compressed representation of neural networks. Therefore, MPEG has issued a Call for Proposals (CfP) for compression technology for neural networks, focusing on the compression of parameters and weights, focusing on four use cases: (i) visual object classification, (ii) audio classification, (iii) visual feature extraction (as used in MPEG CDVA), and (iv) video coding.
Research aspects: As point out last time, research here will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future.

MPEG issues Call for Proposals on Low Complexity Video Coding Enhancements

Upon request from the industry, MPEG has identified an area of interest in which video technology deployed in the market (e.g., AVC, HEVC) can be enhanced in terms of video quality without the need to necessarily replace existing hardware. Therefore, MPEG has issued a Call for Proposals (CfP) on Low Complexity Video Coding Enhancements.

The objective is to develop video coding technology with a data stream structure defined by two component streams: a base stream decodable by a hardware decoder and an enhancement stream suitable for software processing implementation. The project is meant to be codec agnostic; in other words, the base encoder and base decoder can be AVC, HEVC, or any other codec in the market.
Research aspects: The interesting aspect here is that this use case assumes a legacy base decoder - most likely realized in hardware - which is enhanced with software-based implementations to improve coding efficiency or/and quality without sacrificing capabilities of the end user in terms of complexity and, thus, energy efficiency due to the software based solution.

MPEG issues Call for Proposals for a New Video Coding Standard expected to have licensing terms timely available

At its 124th meeting, MPEG issued a Call for Proposals (CfP) for a new video coding standard to address combinations of both technical and application (i.e., business) requirements that may not be adequately met by existing standards. The aim is to provide a standardized video compression solution which combines coding efficiency similar to that of HEVC with a level of complexity suitable for real-time encoding/decoding and the timely availability of licensing terms.
Research aspects: This new work item is more related to business aspects (i.e., licensing terms) than technical aspects of video coding. As this blog is about technical aspects and I'm also not an expert in licensing terms, I do not comment on this any further.

Multi-Image Application Format (MIAF) promoted to Final Draft International Standard

The Multi-Image Application Format (MIAF) defines interoperability points for creation, reading, parsing, and decoding of images embedded in High Efficiency Image File (HEIF) format by (i) only defining additional constraints on the HEIF format, (ii) limiting the supported encoding types to a set of specific profiles and levels, (iii) requiring specific metadata formats, and (iv) defining a set of brands for signaling such constraints including specific depth map and alpha plane formats. For instance, it addresses use case like a capturing device may use one of HEIF codecs with a specific HEVC profile and level in its created HEIF files, while a playback device is only capable of decoding the AVC bitstreams.
Research aspects: MIAF is an application format which is defined as a combination of tools (incl. profiles and levels) of other standards (e.g., audio codecs, video codecs, systems) to address the needs of a specific application. Thus, the research is related to use cases enabled by this application format.

3DoF+ Draft Call for Proposal goes Public

Following investigations on the coding of “three Degrees of Freedom plus” (3DoF+) content in the context of MPEG-I, the MPEG video subgroup has provided evidence demonstrating the capability to encode a 3DoF+ content efficiently while maintaining compatibility with legacy HEVC hardware. As a result, MPEG decided to issue a draft Call for Proposal (CfP) to the public containing the information necessary to prepare for the final Call for Proposal expected to occur at the 125th MPEG meeting (January 2019) with responses due at the 126th MPEG meeting (March 2019).
Research aspects: This work item is about video (coding) and, thus, research is about compression efficiency.

What else happened at #MPEG124?

  • MPEG-DASH 3rd edition is still in the final editing phase and not yet available. Last time, I wrote that we expect final publication later this year or early next year and we hope this is still the case. At this meeting Amendment.5 is progressed to DAM and conformance/reference software for SRD, SAND and Server Push is also promoted to DAM. In other words, DASH is pretty much in maintenance mode.
  • MPEG-I (systems part) is working on immersive media access and delivery and I guess more updates will come on this after the next meeting. OMAF is working on a 2nd edition for which a working draft exists and phase 2 use cases (public document) and draft requirements are discussed.
  • Versatile Video Coding (VVC): working draft 3 (WD3) and test model 3 (VTM3) has been issued at this meeting including a large number of new tools. Both documents (and software) will be publicly available after editing periods (Nov. 23 for WD3 and Dec 14 for VTM3). JVET documents are publicly available here http://phenix.it-sudparis.eu/jvet/.

Last but not least, some ads...

Tuesday, October 9, 2018

AAU and Bitmovin presenting IEEE ICIP 2018

The IEEE International Conference in Image Processing (ICIP) is with more than 1,000 attendees one of the biggest conferences of the Signal Processing Society. At ICIP'18Anatoliy (AAU) and myself (AAU/Bitmovin) attended with the following presentations:

On Monday, October 8, I was on the panel of the Young Professional Networking Event (together with Amy Reibman and Sheila Hemami) sharing my experiences with all attendees. See one picture here.

On Tuesday, October 9, I presented at the Innovation Program talking about "Video Coding for Large-Scale HTTP Adaptive Streaming Deployments: State of the Art and Challenges Ahead".



On Wednesday, October 10, Anatoliy presented our joint AAU/Bitmovin paper about "A Practical Evaluation of Video Codecs for Large-Scale HTTP Adaptive Streaming Services". Abstract: The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio- visual content accounts for the majority of today’s internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs. This paper provides a practical evaluation of state of the art video codecs (i.e., AV1, AVC/libx264, HEVC/libx265, VP9/libvpx-vp9) for large- scale HTTP adaptive streaming services. In anticipation of the results, AV1 shows promising performance compared to established video codecs. Additionally, AV1 is intended to be royalty free making it worthwhile to be considered for large scale HTTP adaptive streaming services.


A Practical Evaluation of Video Codecs for Large-Scale HTTP Adaptive Streaming Services from Christian Timmerer

Acknowledgment: This work was supported in part by the Austrian Research Promotion Agency (FFG) under the Next Generation Video Streaming project “PROMETHEUS”.

Tuesday, October 2, 2018

Almost 58 percent of downstream traffic on the internet is video

The Global Internet Phenomena Report provided by Sandvine - together with Cisco's Visual Networking Index (VNI) - was always a good source how internet traffic evolves over time, specifically in the context of streaming audio and video content (note: Nielsen's Law of Internet Bandwidth is also worth noting here as well as Bitmovin's Video Developer Survey). I used this report on many of my presentations to highlight the 'importance of multimedia delivery'. Thus, I'm happy to see that on October 2, 2018 Sandvine released a new version of its Global Internet Phenomena Report after a rather long break of two years.

The report is available here with some highlights reported below.

Almost 58% of downstream traffic on the internet is video and Netflix is 15% of the total downstream volume traffic across the entire internet.

The streaming video traffic share comes up with some regional differences (see figure below). Netflix dominates Americas video streaming (30.71%) whereas EMEA is lead by YouTube (30.39%) and APAC is not dominated by any streaming services but "HTTP media stream" (29.24%) in general. Overall, Netflix and YouTube together are still responsible for approx. 50% of the global video streaming traffic share.


More to come later...

Monday, September 24, 2018

2018 Video Developer Survey

Bitmovin's 2018 Video Developer Survey reveals interesting details about

  • streaming formats (MPEG-DASH, HLS, CMAF, etc.),
  • video codecs (AVC, HEVC, VP9, AV1),
  • audio codecs (AAC, MP3, Dolby, etc.),
  • encoding preferences (hardware, software on-premise/cloud),
  • players (open source, commercial, in-house solution),
  • DRM,
  • monetization model,
  • ad standard/technology, and
  • what are the biggest problems experienced today (DRM, ad-blocker, ads in general, server-side ad insertion, CDN issues, broadcast delay/latency, getting playback to run on all devices)
For example, the figure below illustrates the planned video codec usage in the next 12 months compared to the 2017 report.
Planned video codec usage in the next 12 months 2017 vs. 2018.
In total, 456 survey submissions from over 67 countries have been received and included into the report, which can be downloaded here for free.

Friday, August 24, 2018

Doctoral Students & Postdoctoral Researchers within the H2020 project ARTICONF

The Institute of Information Technology at the Alpen-Adria-Universität Klagenfurt invites applications for:

DOCTORAL STUDENTS & POSTDOCTORAL RESEARCHERS
within the H2020 project ARTICONF
“smART socIal media eCOsystem in a blockchaiN Federated environment” 


at the Faculty for Technical Sciences. The monthly salary for these positions are according to the standard salaries of the Austrian collective agreement.

The research group “Distributed Systems” does its research in the field of parallel computing, Cloud computing and multimedia systems. The goal is to publish in international, high quality professional journals and conference transcripts and to cooperate with various commercial partners. With regard to teaching, our research group covers additional fields such as computer networks, operation systems, distributed systems and compiler construction.

Your profile:
  • Doctoral, master or diploma degree of Technical Science in the field of Computer Science, completed at a domestic or foreign university (with good final degrees); 
  • Interest and experience in parallel and distributed computing, Big Data, and energy efficiency; 
  • Excellent English skills, both in written and oral form. 
Desirable qualifications include:
  • Excellent programming skills, especially C, C++ and Java; 
  • Experience in programming distributed systems, especially OpenStack; 
  • Experience in programming parallel systems (multithreading, multiprocessing, OpenMP, MPI, CUDA, OpenCL); 
  • Relevant international and practical work experience; 
  • Social and communicative competences and ability to work in a team; 
  • Experience with university teaching and research activities. 
The working language and the research program is in English. There is no need to learn German for this position.

Submit all relevant documents, including copies of all school certificates and performance records, by email to:

Prof. Radu Prodan
Institute of Information Technology, Alpen-Adria-Universität Klagenfurt
Universitätsstraße 65 – 67
9020 Klagenfurt, Austria
Email: radu.prodan@aau.at

URL: www.aau.at/tewi/inf/itec/

Wednesday, August 15, 2018

MPEG news: a report from the 123rd meeting, Ljubljana, Slovenia

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
The MPEG press release comprises the following topics:

  • MPEG issues Call for Evidence on Compressed Representation of Neural Networks
  • Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
  • MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
  • MPEG releases software for MPEG-I visual activities
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.
At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: As this is about "compression" of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).
NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.
After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.
Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.
Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.
Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard's participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.

What else happened at #MPEG123?

  • The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
  • There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.
  • MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
  • Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
  • JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: http://phenix.it-sudparis.eu/jvet/. An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
  • In addition to VVC -- which is a joint activity with VCEG --, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
  • Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.
In this context it is probably worth mentioning the following DASH awards at recent conferences
Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Tuesday, August 7, 2018

A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP

A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP

[PDF] *** open access ***

Abdelhak Bentaleb, Member, IEEE, Bayan Taani, Member, IEEE, Ali C. Begen, Senior Member, IEEE, Christian Timmerer, Senior Member, IEEE, and Roger Zimmermann, Senior Member, IEEE

HAS adaptation scheme classification.
Abstract --- In this survey, we present state-of-the-art bitrate adaptation algorithms for HTTP adaptive streaming (HAS). As a key distinction from other streaming approaches, the bitrate adaptation algorithms in HAS are chiefly executed at each client, i.e., in a distributed manner. The objective of these algorithms is to ensure a high Quality of Experience (QoE) for viewers in the presence of bandwidth fluctuations due to factors like signal strength, network congestion, network reconvergence events, etc. While such fluctuations are common in public Internet, they can also occur in home networks or even managed networks where there is often admission control and QoS tools. Bitrate adaptation algorithms may take factors like bandwidth estimations, playback buffer fullness, device features, viewer preferences, and content features into account, albeit with different weights. Since the viewer’s QoE needs to be determined in real-time during playback, objective metrics are generally used including number of buffer stalls, duration of startup delay, frequency and amount of quality oscillations, and video instability. By design, the standards for HAS do not mandate any particular adaptation algorithm, leaving it to system builders to innovate and implement their own method. This survey provides an overview of the different methods proposed over the last several years.

Citation: A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer and R. Zimmermann, "A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP," in IEEE Communications Surveys & Tutorials.
doi: 10.1109/COMST.2018.2862938