Wednesday, August 15, 2018

MPEG news: a report from the 123rd meeting, Ljubljana, Slovenia

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
The MPEG press release comprises the following topics:

  • MPEG issues Call for Evidence on Compressed Representation of Neural Networks
  • Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work
  • MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media
  • MPEG releases software for MPEG-I visual activities
  • MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

MPEG issues Call for Evidence on Compressed Representation of Neural Networks

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, media coding, data analytics, translation and many other fields. Their recent success is based on the feasibility of processing much larger and complex neural networks (deep neural networks, DNNs) than in the past, and the availability of large-scale training data sets. As a consequence, trained neural networks contain a large number of parameters (weights), resulting in a quite large size (e.g., several hundred MBs). Many applications require the deployment of a particular trained network instance, potentially to a larger number of devices, which may have limitations in terms of processing power and memory (e.g., mobile devices or smart cameras). Any use case, in which a trained neural network (and its updates) needs to be deployed to a number of devices could thus benefit from a standard for the compressed representation of neural networks.
At its 123rd meeting, MPEG has issued a Call for Evidence (CfE) for compression technology for neural networks. The compression technology will be evaluated in terms of compression efficiency, runtime, and memory consumption and the impact on performance in three use cases: visual object classification, visual feature extraction (as used in MPEG Compact Descriptors for Visual Analysis) and filters for video coding. Responses to the CfE will be analyzed on the weekend prior to and during the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: As this is about "compression" of structured data, research aspects will mainly focus around compression efficiency for both lossy and lossless scenarios. Additionally, communication aspects such as transmission of compressed artificial neural networks within lossy, large-scale environments including update mechanisms may become relevant in the (near) future. Furthermore, additional use cases should be communicated towards MPEG until the next meeting.

Network-Based Media Processing – MPEG evaluates responses to call for proposal and kicks off its technical work

Recent developments in multimedia have brought significant innovation and disruption to the way multimedia content is created and consumed. At its 123rd meeting, MPEG analyzed the technologies submitted by eight industry leaders as responses to the Call for Proposals (CfP) for Network-Based Media Processing (NBMP, MPEG-I Part 8). These technologies address advanced media processing use cases such as network stitching for virtual reality (VR) services, super-resolution for enhanced visual quality, transcoding by a mobile edge cloud, or viewport extraction for 360-degree video within the network environment. NBMP allows service providers and end users to describe media processing operations that are to be performed by the entities in the networks. NBMP will describe the composition of network-based media processing services out of a set of NBMP functions and makes these NBMP services accessible through Application Programming Interfaces (APIs).
NBMP will support the existing delivery methods such as streaming, file delivery, push-based progressive download, hybrid delivery, and multipath delivery within heterogeneous network environments. MPEG issued a Call for Proposal (CfP) seeking technologies that allow end-user devices, which are limited in processing capabilities and power consumption, to offload certain kinds of processing to the network.
After a formal evaluation of submissions, MPEG selected three technologies as starting points for the (i) workflow, (ii) metadata, and (iii) interfaces for static and dynamically acquired NBMP. A key conclusion of the evaluation was that NBMP can significantly improve the performance and efficiency of the cloud infrastructure and media processing services.
Research aspects: I reported about NBMP in my previous post and basically the same applies here. NBMP will be particularly interesting in the context of new networking approaches including, but not limited to, software-defined networking (SDN), information-centric networking (ICN), mobile edge computing (MEC), fog computing, and related aspects in the context of 5G.

MPEG finalizes 1st edition of Technical Report on Architectures for Immersive Media

At its 123nd meeting, MPEG finalized the first edition of its Technical Report (TR) on Architectures for Immersive Media. This report constitutes the first part of the MPEG-I standard for the coded representation of immersive media and introduces the eight MPEG-I parts currently under specification in MPEG. In particular, it addresses three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)), 3DoF+ (3DoF with additional limited translational movements (typically, head movements) along X, Y and Z axes), and 6DoF (3DoF with full translational movements along X, Y and Z axes) experiences but it mostly focuses on 3DoF. Future versions are expected to cover aspects beyond 3DoF. The report documents use cases and defines architectural views on elements that contribute to an overall immersive experience. Finally, the report also includes quality considerations for immersive services and introduces minimum requirements as well as objectives for a high-quality immersive media experience.
Research aspects: ISO/IEC technical reports are typically publicly available and provides informative descriptions of what the standard is about. In MPEG-I this technical report can be used as a guideline for possible architectures for immersive media. This first edition focuses on three Degrees of Freedom (3DoF; three rotational and un-limited movements around the X, Y and Z axes (respectively pitch, yaw and roll)) and outlines the other degrees of freedom currently foreseen in MPEG-I. It also highlights use cases and quality-related aspects that could be of interest for the research community.

MPEG releases software for MPEG-I visual activities

MPEG-I visual is an activity that addresses the specific requirements of immersive visual media for six degrees of freedom virtual walkthroughs with correct motion parallax within a bounded volume. MPEG-I visual covers application scenarios from 3DoF+ with slight body and head movements in a sitting position to 6DoF allowing some walking steps from a central position. At the 123nd MPEG meeting, an important progress has been achieved in software development. A new Reference View Synthesizer (RVS 2.0) has been released for 3DoF+, allowing to synthesize virtual viewpoints from an unlimited number of input views. RVS integrates code bases from Universite Libre de Bruxelles and Philips, who acted as software coordinator. A Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility, essential to 3DoF+ and 6DoF activities, has been developed by Zhejiang University. WS-PSNR is a full reference objective quality metric for all flavors of omnidirectional video. RVS and WS-PSNR are essential software tools for the upcoming Call for Proposals on 3DoF+ expected to be released at the 124th MPEG meeting in October 2018 (Macau, CN).
Research aspects: MPEG does not only produce text specifications but also reference software and conformance bitstreams, which are important assets for both research and development. Thus, it is very much appreciated to have a new Reference View Synthesizer (RVS 2.0) and Weighted-to-Spherically-uniform PSNR (WS-PSNR) software utility available which enables interoperability and reproducibility of R&D efforts/results in this area.

MPEG enhances ISO Base Media File Format (ISOBMFF) with new features

At the 123rd MPEG meeting, a couple of new amendments related to ISOBMFF has reached the first milestone. Amendment 2 to ISO/IEC 14496-12 6th edition will add the option to have relative addressing as an alternative to offset addressing, which in some environments and workflows can simplify the handling of files and will allow creation of derived visual tracks using items and samples in other tracks with some transformation, for example rotation. Another amendment reached its first milestone is the first amendment to ISO/IEC 23001-7 3rd edition. It will allow use of multiple keys to a single sample and scramble some parts of AVC or HEVC video bitstreams without breaking conformance to the existing decoders. That is, the bitstream will be decodable by existing decoders, but some parts of the video will be scrambled. It is expected that these amendments will reach the final milestone in Q3 2019.
Research aspects: The ISOBMFF reference software is now available on Github, which is a valuable service to the community and allows for active standard's participation even from outside of MPEG. It is recommended that interested parties have a look at it and consider contributing to this project.

What else happened at #MPEG123?

  • The MPEG-DASH 3rd edition is finally available as output document (N17813; only available to MPEG members) combining 2nd edition, four amendments, and 2 corrigenda. We expect final publication later this year or early next year.
  • There is a new DASH amendment and corrigenda items in pipeline which should progress to final stages also some time next year. The status of MPEG-DASH (July 2018) can be seen below.
  • MPEG received a rather interesting input document related to “streaming first” which resulted into a publicly available output document entitled “thoughts on adaptive delivery and access to immersive media”. The key idea here is to focus on streaming (first) rather than on file/encapsulation formats typically used for storage (and streaming second). This document should become available here.
  • Since a couple of meetings, MPEG maintains a standardization roadmap highlighting recent/major MPEG standards and documenting the roadmap for the next five years. It definitely worth keeping this in mind when defining/updating your own roadmap.
  • JVET/VVC issued Working Draft 2 of Versatile Video Coding (N17732 | JVET-K1001) and Test Model 2 of Versatile Video Coding (VTM 2) (N17733 | JVET-K1002). Please note that N-documents are MPEG internal but JVET-documents are publicly accessible here: An interesting aspect is that VTM2/WD2 should have >20% rate reduction compared to HEVC, all with reasonable complexity and the next benchmark set (BMS) should have close to 30% rate reduction vs. HEVC. Further improvements expected from (a) improved merge, intra prediction, etc., (b) decoder-side estimation with low complexity, (c) multi-hypothesis prediction and OBMC, (d) diagonal and other geometric partitioning, (e) secondary transforms, (f) new approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.), (g) current picture referencing, palette, and (h) neural networks.
  • In addition to VVC -- which is a joint activity with VCEG --, MPEG is working on two video-related exploration activities, namely (a) an enhanced quality profile of the AVC standard and (b) a low complexity enhancement video codec. Both topics will be further discussed within respective Ad-hoc Groups (AhGs) and further details are available here.
  • Finally, MPEG established an Ad-hoc Group (AhG) dedicated to the long-term planning which is also looking into application areas/domains other than media coding/representation.
In this context it is probably worth mentioning the following DASH awards at recent conferences
Additionally, there have been two tutorials at ICME related to MPEG standards, which you may find interesting

Tuesday, August 7, 2018

A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP

A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP

[PDF] *** open access ***

Abdelhak Bentaleb, Member, IEEE, Bayan Taani, Member, IEEE, Ali C. Begen, Senior Member, IEEE, Christian Timmerer, Senior Member, IEEE, and Roger Zimmermann, Senior Member, IEEE

HAS adaptation scheme classification.
Abstract --- In this survey, we present state-of-the-art bitrate adaptation algorithms for HTTP adaptive streaming (HAS). As a key distinction from other streaming approaches, the bitrate adaptation algorithms in HAS are chiefly executed at each client, i.e., in a distributed manner. The objective of these algorithms is to ensure a high Quality of Experience (QoE) for viewers in the presence of bandwidth fluctuations due to factors like signal strength, network congestion, network reconvergence events, etc. While such fluctuations are common in public Internet, they can also occur in home networks or even managed networks where there is often admission control and QoS tools. Bitrate adaptation algorithms may take factors like bandwidth estimations, playback buffer fullness, device features, viewer preferences, and content features into account, albeit with different weights. Since the viewer’s QoE needs to be determined in real-time during playback, objective metrics are generally used including number of buffer stalls, duration of startup delay, frequency and amount of quality oscillations, and video instability. By design, the standards for HAS do not mandate any particular adaptation algorithm, leaving it to system builders to innovate and implement their own method. This survey provides an overview of the different methods proposed over the last several years.

Citation: A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer and R. Zimmermann, "A Survey on Bitrate Adaptation Schemes for Streaming Media over HTTP," in IEEE Communications Surveys & Tutorials.
doi: 10.1109/COMST.2018.2862938

Thursday, July 26, 2018

DASH-IF awarded Grand Challenge on Dynamic Adaptive Streaming over HTTP at IEEE ICME 2018

July 25, ICME 2018, San Diego, CA, USA

DASH-IF awarded Grand Challenge on Dynamic Adaptive Streaming over HTTP at IEEE ICME 2018

Real-time entertainment services such as streaming video and audio are currently accounting for more than 70% of the Internet traffic during peak hours. Interestingly, these services are all delivered over-the-top (OTT) of the existing networking infrastructure using HTTP. The MPEG Dynamic Adaptive Streaming over HTTP (DASH) standard enables smooth multimedia streaming towards heterogeneous devices.

The MPEG DASH standard provides an interoperable representation format but deliberately does not define the adaptation behavior for the client implementations, which is left open for research and industry competition. In a typical deployment, the encoding itself is optimized for the respective delivery channels but - as the content is delivered over the top without any guarantees - various issues during the streaming (e.g., high startup delay, stalls/re-buffering, high switching frequency, inefficient network utilization, unfairness to competing network traffic, etc.) may limit the quality of experience (QoE) as perceived by the viewers.

The aim of this grand challenge is to solicit contributions addressing end-to-end delivery aspects that will help improve the QoE while optimally using the network resources at an acceptable cost. Such aspects include, but are not limited to, content preparation for adaptive streaming, delivery in the Internet and streaming client implementations.

A special focus of 2018’s grand challenge will be related to immersive media applications and services including omnidirectional/360-degree videos.

The winner will be awarded  €750 and the runner-up €250.

Each submission has been presented at IEEE ICME 2017 within an oral session, which was attended very well.

(photos by C. Timmerer)

This year's award goes to the following papers:

WINNER: "Game Theory Based Bitrate Adaptation for dash.js Reference Player" by Abdelhak Bentaleb, Ali Begen, Roger Zimmermann

Christian Timmerer (left), Abdelhak Bentaleb, Vasudev Bhaskaran, Lei Zhang.

RUNNER-UP: "Tile-based QoE-driven HTTP/2 Streaming System for 360 Video" by Zhimin Xu, Yixuan Ban, Kai Zhang, Lan Xie, Xinggong Zhang, Zongming Guo, Shengbin Meng, Yue Wang

Christian Timmerer (left), Zongming Guo, Vasudev Bhaskaran, Lei Zhang.

We would like to congratulate all winners. 

Thursday, July 19, 2018


The Institute of Information Technology at Alpen-Adria-Universität Klagenfurt, Austria, announces the following job vacancy:

(fixed-term employment for the period of 3 years, 30 hours/week) 

at the Faculty of Technical Sciences, Institute of Information Technology (ITEC). The gross salary per month for this position is € 2,112.40 (pre tax, 14 times a year) – i.e., the default salary as defined by the Austrian Science Fund (FWF). Estimated commencement of duties will be the 1st of October, 2018.

Your main duties include conducting scientific research in the context of the OVID (“Relevance Detection in Ophthalmic Surgery Videos”) FWF research project with the goal to publish in international, high-quality conferences and journals. You will work under supervision of experienced researchers at ITEC and in cooperation with other doctoral candidates working on the same project.

More precisely, your duties will be:
  • Independent scientific research in the field of medical video content analysis and machine learning with the ultimate goal to obtain the PhD degree at Alpen-Adria-Universität Klagenfurt 
  • Collaboration with the “Medical Multimedia” research team at ITEC in terms of research and (optional) teaching
  • Participation in supervision of master students 
  • Assistance with project-related administrative tasks within the department and in university committees 
  • Assistance with project-related public relations activities within the institute and faculty 
Your profile:
  • Master or diploma degree of Technical Science in the field of Computer Science, completed at a domestic or foreign university (with good final degrees) 
  • Knowledge and experience in: multimedia content analysis, software engineering, and preferably machine learning 
  • Excellent programming skills, especially in C++, Java, and Python 
  • Fluency in English, both in written and oral form 
All relevant documents for the application (at least a curriculum vitae and the master’s certificate including final grades) have to be submitted via e-mail to Assoc.Prof. DI Dr. Klaus Schöffmann ( no later than the 15th of September, 2018.

Monday, July 2, 2018

Internet-QoE'18 Keynote: HTTP Adaptive Streaming - State of the Art and Challenges Ahead


Vienna, Austria, July 2, 2018
co-located with IEEE ICDCS 2018

Abstract: Real-time entertainment services deployed over the open, unmanaged Internet – streaming audio and video – account now for more than 70% of the Internet traffic and it is assumed that this number will reach 80% by 2021. The technology used for such services is commonly referred to as HTTP Adaptive Streaming (HAS) and is widely adopted by various platforms such as YouTube, Netflix, Flimmit, etc. thanks to the standardization of MPEG-DASH and HLS. This talk will provide an overview of HAS, the state of the art of selected deployment options, and reviews work-in-progress as well challenges ahead. The main challenge can be characterized by the fact that (i) content complexity increases, (ii) delay or latency are vital application requirements, and (iii) Quality of Experience cannot be neglected anymore.

Friday, June 15, 2018

DASH-IF awarded Excellence in DASH award at ACM MMSys 2018

The DASH Industry Forum Excellence in DASH Award at ACM MMSys 2018 acknowledges papers substantially addressing MPEG-DASH as the presentation format and are selected for presentation at ACM MMSys 2018. Preference is given to practical enhancements and developments which can sustain future commercial usefulness of DASH. The DASH format used should conform to the DASH-IF Interoperability Points as defined by It is a financial prize as follows: First place – €1000; Second place – €500; and Third place – €250. The winners are chosen by a DASH Industry Forum appointed committee and results are final.

Christian Timmerer (left) and Viswanathan (Vishy) Swaminathan (right)

The winners are chosen by a DASH Industry Forum appointed committee and results are final.

This year's award goes to the following papers (Two first places, one second, and one third):

1. Kevin Spiteri, Ramesh Sitaraman, Daniel Sparacio. From Theory to Practice: Improving Bitrate Adaptation in the DASH Reference Player

Christian Timmerer (from left to right), Kevin Spiteri, Ramesh Sitaraman, and Viswanathan (Vishy) Swaminathan
1. Abdelhak Bentaleb, Ali C. Begen, Roger Zimmermann, Saad Harous. Want to Play DASH? GTA: A Game Theoretic Approach for Adaptive Streaming over HTTP

Christian Timmerer (from left to right), Roger Zimmermann, Abdelhak Bentaleb, Ali C. Begen, and Viswanathan (Vishy) Swaminathan
2. Savino Dambra, Giuseppe Samela, Lucile Sassatelli, Romaric Pighetti, Ramon Aparicio Pardo, Anne-Marie Pinna-Déry. Film Editing: New Levers to Improve VR Streaming

Christian Timmerer (from left to right), Lucile Sassatelli, and Viswanathan (Vishy) Swaminathan
3. S. Silva, J. Bruneau-Queyreix, M. Lacaud, D. Négru, L. Réveillère, MUSLIN: Achieving High, Fairly Shared QoE Through Multi-Source Live Streaming

Christian Timmerer (from left to right), Simon Da Silva, and Viswanathan (Vishy) Swaminathan
We would like to congratulate all winners and hope seeing you next year at ACM MMSys 2019.

Tuesday, June 12, 2018

Packet Video 2018: Best Paper Award

Packet Video 2018 -- Best Paper Award

The TPC chairs of Packet Video 2018 identified the following candidates based on the reviews:
  • A. Zare, A. Aminlou, M. Hannuksela, 6K Effective Resolution with 4K HEVC Decoding Capability for OMAF-compliant 360° Video Streaming
  • J. Schneider, M. Bläser, M. Wien, Sparse Coding based Frequency Adaptive Loop Filtering for Video Coding
  • S. Arisu, A. Begen, Quickly Starting Media Streams Using QUIC
... and the winner of the Packet Video 2018 Best Paper Award is:

6K Effective Resolution with 4K HEVC Decoding Capability for OMAF-compliant 360° Video Streaming
A. Zare, A. Aminlou, M. Hannuksela

Abstract: The recent Omnidirectional MediA Format (OMAF) standard specifies delivery of 360° video content. OMAF supports only equirectangular (ERP) and cubemap projections and their region- wise packing with a limitation on video decoding capability to the maximum resolution of 4K (e.g., 4096x2048). Streaming of 4K ERP content allows only a limited viewport resolution, which is lower than the resolution of many current head-mounted displays (HMDs). In order to take the full advantage of those HMDs, this work proposes a specific mixed-resolution packing of 6K (6144x3072) ERP content and its realization in tile-based streaming, while complying with the 4K-decoding constraint and the High Efficiency Video Coding (HEVC) standard. Experimental results indicate that, using Zonal-PSNR test methodology, the proposed layout decreases the streaming bitrate up to 32% in terms of BD-rate, when compared to mixed-quality viewport-adaptive streaming of 4K ERP as an alternative solution.

T. Schierl (left), A. Zare (middle), R. Zimmermann (right)
T. Schierl (left), A. Zare (middle), R. Zimmermann (right)