Tuesday, August 3, 2021

MPEG news: a report from the 135th meeting (virtual)

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.

MPEG News Archive

The 135th MPEG meeting was once again held as an online meeting, and the official press release can be found here and comprises the following items:

  • MPEG Video Coding promotes MPEG Immersive Video (MIV) to the FDIS stage
  • Verification tests for more application cases of Versatile Video Coding (VVC)
  • MPEG Systems reaches first milestone for Video Decoding Interface for Immersive Media
  • MPEG Systems further enhances the extensibility and flexibility of Network-based Media Processing
  • MPEG Systems completes support of Versatile Video Coding and Essential Video Coding in High Efficiency Image File Format
  • Two MPEG White Papers:
    • Versatile Video Coding (VVC)
    • MPEG-G and its application of regulation and privacy

In this column, I’d like to focus on MIV and VVC including systems-related aspects as well as a brief update about DASH (as usual).

MPEG Immersive Video (MIV)

At the 135th MPEG meeting, MPEG Video Coding has promoted the MPEG Immersive Video (MIV) standard to the Final Draft International Standard (FDIS) stage. MIV was developed to support compression of immersive video content in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables storage and distribution of immersive video content over existing and future networks for playback with 6 Degrees of Freedom (6DoF) of view position and orientation.

From a technical point of view, MIV is a flexible standard for multiview video with depth (MVD) that leverages the strong hardware support for commonly used video codecs to code volumetric video. The actual views may choose from three projection formats: (i) equirectangular, (ii) perspective, or (iii) orthographic. By packing and pruning views, MIV can achieve bit rates around 25 Mb/s and a pixel rate equivalent to HEVC Level 5.2.

The MIV standard is designed as a set of extensions and profile restrictions for the Visual Volumetric Video-based Coding (V3C) standard (ISO/IEC 23090-5). The main body of this standard is shared between MIV and the Video-based Point Cloud Coding (V-PCC) standard (ISO/IEC 23090-5 Annex H). It may potentially be used by other MPEG-I volumetric codecs under development. The carriage of MIV is specified through the Carriage of V3C Data standard (ISO/IEC 23090-10).

The test model and objective metrics are publicly available at https://gitlab.com/mpeg-i-visual.

At the same time, MPEG Systems has begun developing the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) for a video decoders’ input and output interfaces to provide more flexible use of the video decoder resources for such applications. At the 135th MPEG meeting, MPEG Systems has reached the first formal milestone of developing the ISO/IEC 23090-13 standard by promoting the text to Committee Draft ballot status. The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures in such a way so that the number of actual video decoders can be smaller than the number of the elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions that are to be actually presented to the users rather than considering only the number of video elementary streams in use.

Research aspects: It seems that visual compression and systems standards enabling immersive media applications and services are becoming mature. However, the Quality of Experience (QoE) of such applications and services is still in its infancy. The QUALINET White Paper on Definitions of Immersive Media Experience (IMEx) provides a survey of definitions of immersion and presence which leads to a definition of Immersive Media Experience (IMEx). Consequently, the next step is working towards QoE metrics in this domain that requires subjective quality assessments imposing various challenges during the current COVID-19 pandemic.

Versatile Video Coding (VVC) updates

The third round of verification testing for Versatile Video Coding (VVC) has been completed. This includes the testing of High Dynamic Range (HDR) content of 4K ultra-high-definition (UHD) resolution using the Hybrid Log-Gamma (HLG) and Perceptual Quantization (PQ) video formats. The test was conducted using state-of-the-art high-quality consumer displays, emulating an internet streaming-type scenario.

On average, VVC showed on average approximately 50% bit rate reduction compared to High Efficiency Video Coding (HEVC).

Additionally, the ISO/IEC 23008-12 Image File Format has been amended to support images coded using Versatile Video Coding (VVC) and Essential Video Coding (EVC).

Research aspects: The results of the verification tests are usually publicly available and can be used as a baseline for future improvements of the respective standards including the evaluation thereof. For example, the tradeoff compression efficiency vs. encoding runtime (time complexity) for live and video on-demand scenarios is always an interesting research aspect.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 135th MPEG meeting, MPEG Systems issued a draft amendment to the core MPEG-DASH specification (i.e., ISO/IEC 23009-1) that provides further improvements of Preroll which is renamed to Preperiod and it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Additionally, this amendment includes some minor improvements for nonlinear playback. The so-called Technologies under Consideration (TuC) document comprises new proposals that did not yet reach consensus for promotion to any official standards documents (e.g., amendments to existing DASH standards or new parts). Currently, proposals for minimizing initial delay are discussed among others. Finally, libdash has been updated to support the MPEG-DASH schema according to the 5th edition.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH status of July 2021.

Research aspects: The informative aspects of MPEG-DASH such as the adaptive bitrate (ABR) algorithms have been subject to research for many years. New editions of the standard mostly introduced incremental improvements but disruptive ideas rarely reached the surface. Perhaps it's time to take a step back and re-think how streaming should work for todays and future media applications and services.

The 136th MPEG meeting will be again an online meeting in October 2021 but MPEG is aiming to meet in-person again in January 2021 (if possible). Click here for more information about MPEG meetings and their developments.

Wednesday, July 28, 2021

Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming

Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming
Picture Coding Symposium (PCS)
29 June-2 July 2021, Bristol, UK

Vignesh V Menon,  Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: Since video accounts for the majority of today’s internet traffic, the popularity of HTTP Adaptive Streaming (HAS) is increasing steadily. In HAS, each video is encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to a heterogeneity of network conditions, device characteristics, and end-user preferences. Most of the streaming services utilize cloud-based encoding techniques which enable a fully parallel encoding process to speed up the encoding and consequently to reduce the overall time complexity. State-of-the-art approaches further improve the encoding process by utilizing encoder analysis information from already encoded representation(s) to improve the encoding time complexity of the remaining representations. In this paper, we investigate various multi-encoding algorithms (i.e., multi-rate and multi-resolution) and propose novel multi- encoding algorithms for large-scale HTTP Adaptive Streaming deployments. Experimental results demonstrate that the proposed multi-encoding algorithm optimized for the highest compression efficiency reduces the overall encoding time by 39% with a 1.5% bitrate increase compared to stand-alone encodings. Its optimized version for the highest time savings reduces the overall encoding time by 50% with a 2.6% bitrate increase compared to stand-alone encodings.

Keywords: HTTP Adaptive Streaming, HEVC, Multi-rate Encoding, Multi-encoding.

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Wednesday, July 14, 2021

Call for Papers: ViSNext 2021 Workshop at the ACM CoNEXT 2021 Conference

ViSNext’21: 1st ACM CoNEXT Workshop on Design, Deployment, and Evaluation of Network-assisted Video Streaming

In recent years, we have witnessed phenomenal growth in live video traffic over the Internet, accelerated by the rise of novel video streaming technologies, advancements in networking paradigms, and our ability to generate, process, and display videos on heterogeneous devices. Regarding the existing constraints and limitations in different components on the video delivery path from the origin server to clients, the network plays an essential role in boosting the perceived Quality of Experience (QoE) by clients. The ViSNext workshop aims to bring together researchers and developers working on all aspects of video streaming, in particular network-assisted concepts backed up by experimental evidence. We warmly invite submission of original, previously unpublished papers addressing key issues in this area, but not limited to:

  • Design, analysis, and evaluation of network-assisted multimedia system architectures
  • Optimization of edge, fog, and mobile edge computing for video streaming applications
  • Optimization of caching policies/systems for video streaming applications
  • Network-assisted resource allocation for video streaming
  • Experience and lessons learned by deploying large-scale network-assisted video streaming
  • Internet measurement and modeling for enhancing QoE in video streaming applications
  • Design, analysis, and evaluation of network-assisted Adaptive Bitrate (ABR) streaming
  • Network aspects in video streaming: cloud computing, virtualization techniques, network control, and management, including SDN, NFV, and network programmability
  • Routing and traffic engineering in end-to-end video streaming
  • Topics at the intersection of energy-efficient computing and networking for video streaming
  • Network-assisted techniques for low-latency video streaming
  • Machine learning for improving QoE in video streaming applications
  • Machine learning for traffic engineering and congestion control for video streaming
  • Solutions for improving streaming QoE for high-speed user mobility
  • Analysis, modeling, and experimentation of DASH
  • Big data analytics at the network edge to assess viewer experience of adaptive video
  • Reproducible research in adaptive video streaming: datasets, evaluation methods, benchmarking, standardization efforts, open-source tools
  • Novel use cases and applications in the area of adaptive video streaming
  • Advanced network-based techniques for point clouds, light field, and immersive video
  • Low delay and multipath video communication

ViSNext’21 Co-Chairs

  • Farzad Tashtarian, Alpen-Adria-Universität Klagenfurt, Austria
  • Christian Timmerer, Alpen-Adria-Universität Klagenfurt, Austria
  • Halima Elbiaze, Université du Québec à Montréal, Canada
  • Tim Wauters, Ghent University, Belgium
Submission Instruction
  • Solicited submissions include both full technical workshop papers and white paper position papers. The maximum length of such submissions is up to 6 pages (excluding references) in 2-column 10pt ACM format.
  • Papers must include author names and affiliations for single-blind peer reviewing by the program committee. Authors of accepted submissions are expected to present and discuss their work at the workshop. Register and submit your paper here.

Important Dates

  • Paper Submission: Sep 17, 2021
  • Notification of Acceptance: Oct 18, 2021
  • Camera-ready: Oct 25, 2021
  • Workshop Event: Dec 6, 2021

Contact Us

Any questions regarding submission issues should be directed to visnext21@itec.aau.at

Monday, July 5, 2021

IEEE OJ-SP: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning

Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning
IEEE Open Journal of Signal Processing
[PDF]

Ekrem Çetinkaya, Hadi AmirpourChristian Timmerer, and Mohammad Ghanbari
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: Video streaming applications keep getting more attention over the years, and HTTP Adaptive Streaming (HAS) became the de-facto solution for video delivery over the Internet. In HAS, each video is encoded at multiple quality levels and resolutions (i.e., representations) to enable adaptation of the streaming session to viewing and network conditions of the client. This requirement brings encoding challenges along with it, e.g., a video source should be encoded efficiently at multiple bitrates and resolutions. Fast multi-rate encoding approaches aim to address this challenge of encoding multiple representations from a single video by re-using information from already encoded representations. In this paper, a convolutional neural network is used to speed up both multi-rate and multi-resolution encoding for HAS. For multi-rate encoding, the lowest bitrate representation is chosen as the reference. For multi-resolution encoding, the highest bitrate from the lowest resolution representation is chosen as the reference. Pixel values from the target resolution and encoding information from the reference representation are used to predict Coding Tree Unit (CTU) split decisions in High-Efficiency Video Coding (HEVC) for dependent representations. Experimental results show that the proposed method for multi-rate encoding can reduce the overall encoding time by 15.08% and parallel encoding time by 41.26%, with a 0.89% bitrate increase compared to the HEVC reference software. Simultaneously, the proposed method for multi-resolution encoding can reduce the encoding time by 46.27% for the overall encoding and 27.71% for the parallel encoding on average with a 2.05% bitrate increase.

Keywords: HTTP Adaptive Streaming, HEVC, Multirate Encoding, Machine Learning

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Thursday, June 10, 2021

IEEE VCIP 2021 Special Sessions

IEEE VCIP 2021 Special Sessions

http://www.vcip2021.org/

Submission of Papers for Regular, Demo, and Special Sessions (extended): June 27, 2021
Paper Acceptance Notification: August 30, 2021

Title: Learning-based Image and Video Coding

Organizers: João Ascenso (Instituto Superior Técnico), Elena Alshina (Huawei)

Description: Image and video coding algorithms create compact representations of an image by exploiting its spatial redundancy and perceptual irrelevance, thus exploiting the characteristics of the human visual system. Recently, data-driven algorithms such as neural networks have attracted a lot of attention and become a popular area of research and development. This interest is driven by several factors, such as recent advances in processing power (cheap and powerful hardware), the availability of large data sets (big data), and several algorithmic and architectural advances (e.g. generative adversarial networks).

Nowadays, neural networks are the state-of-the-art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g. image classification, object segmentation, saliency detection, but also low-level image processing tasks, such as image denoising, inpainting, and super-resolution. These advances have led to an increased interest in applying deep neural networks to image and video coding, which is now the main focus of the JPEG AI and the JVET NN activities within the JPEG and MPEG standardization committees.

The aim of these novel image and video coding solutions is to design a compact representation model that has been obtained (learned) from a large amount of visual data and can efficiently represent the wide variety of images and videos that are consumed today. Some of the available learning-based image coding solutions already show very promising experimental results in terms of rate-distortion (RD) performance, notably in comparison with conventional standard image codecs (especially HEVC Intra and VVC Intra) which code the image data with hand-crafted transforms, entropy coding, and quantization schemes.

This special session on Learning-based Image and Video Coding gathers technical contributions that demonstrate the efficient coding of image and video content based on a learning-based approach. This topic has received many contributions in recent years and is considered critical for the future of both image and video coding, especially solutions adopting end-to-end training as well as for solutions where learning-based tools replace previous conventional tools.

Monday, June 7, 2021

IEEE TNSM: OSCAR: On Optimizing Resource Utilization in Live Video Streaming

 OSCAR: On Optimizing Resource Utilization in Live Video Streaming

[PDF]
DOI: 10.1109/TNSM.2021.3051950

Alireza Erfanian, Farzad Tashtarian, Anatoliy Zabrovskiy, Christian Timmerer, Hermann Hellwagner
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: Live video streaming traffic and related applications have experienced significant growth in recent years. However, this has been accompanied by some challenging issues, especially in terms of resource utilization. Although IP multicasting can be recognized as an efficient mechanism to cope with these challenges, it suffers from many problems. Applying software-defined networking (SDN) and network function virtualization (NFV) technologies enable researchers to cope with IP multicasting issues in novel ways. In this paper, by leveraging the SDN concept, we introduce OSCAR (Optimizing reSourCe utilizAtion in live video stReaming) as a new cost-aware video streaming approach to provide advanced video coding (AVC)-based live streaming services in the network. In this paper, we use two types of virtualized network functions (VNFs): virtual reverse proxy (VRP) and virtual transcoder function (VTF). At the edge of the network, VRPs are responsible for collecting clients’ requests and sending them to an SDN controller.  Then, by executing a mixed-integer linear program (MILP), the SDN controller determines a group of optimal multicast trees for streaming the requested videos from an appropriate origin server to the VRPs. Moreover, to elevate the efficiency of resource allocation and meet the given end-to-end latency threshold, OSCAR delivers only the highest requested quality from the origin server to an optimal group of VTFs over a multicast tree. The selected VTFs then transcode the received video segments and transmit them to the requesting VRPs in a multicast fashion. To mitigate the time complexity of the proposed MILP model, we present a simple and efficient heuristic algorithm that determines a near-optimal solution in polynomial time. Using the MiniNet emulator, we evaluate the performance of OSCAR in various scenarios. The results show that OSCAR surpasses other SVC- and AVC-based multicast and unicast approaches in terms of cost and resource utilization.

Keywords: Dynamic Adaptive Streaming over HTTP (DASH), Live Video Streaming, Software Defined Networking (SDN), Video Transcoding, Network Function Virtualization (NFV).

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Wednesday, June 2, 2021

ISM’20: Dynamic Segment Repackaging at the Edge for HTTP Adaptive Streaming

Dynamic Segment Repackaging at the Edge for HTTP Adaptive Streaming

IEEE International Symposium on Multimedia (ISM)
2-4 December 2020, Naples, Italy

Jesús Aguilar Armijo, Babak Taraghi, Christian Timmerer, and Hermann Hellwagner
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: Adaptive video streaming systems typically support different media delivery formats, e.g., MPEG-DASH and HLS, replicating the same content multiple times into the network. Such a diversified system results in inefficient use of storage, caching, and bandwidth resources. The Common Media Application Format (CMAF) emerges to simplify HTTP Adaptive Streaming (HAS), providing a single encoding and packaging format of segmented media content and offering the opportunities of bandwidth savings, more cache hits, and less storage needed. However, CMAF is not yet supported by most devices. To solve this issue, we present a solution where we maintain the main advantages of CMAF while supporting heterogeneous devices using different media delivery formats. For that purpose, we propose to dynamically convert the content from CMAF to the desired media delivery format at an edge node. We study the bandwidth savings with our proposed approach using an analytical model and simulation, resulting in bandwidth savings of up to 20% with different media delivery format distributions.
We analyze the runtime impact of the required operations on the segmented content performed in two scenarios: the classic one, with four different media delivery formats, and the proposed scenario, using CMAF-only delivery through the network. We compare both scenarios with different edge compute power assumptions. Finally, we perform experiments in a real video streaming testbed delivering MPEG-DASH using CMAF content to serve a DASH and an HLS client, performing the media conversion for the latter one.

Keywords: CMAF, Edge Computing, HTTP Adaptive Streaming (HAS)

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.