Saturday, January 14, 2023

MPEG news: a report from the 140th meeting

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.


After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:
  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding [1].

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:
  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.
Research aspects: the main research area is still the same as described in my last blog post, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I'd like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

Thursday, November 17, 2022

Doctoral Student Positions in Intelligent Climate-Friendly Video Platform project called “GAIA”


The Institute of Information Technology (ITEC) at the Alpen-Adria-Universität Klagenfurt (AAU) invites applications for:

Doctoral Student Position (100% employment; all genders welcome)
within the Intelligent Climate-Friendly Video Platform project called “GAIA”

at the Faculty of Technical Sciences. The monthly salary for these positions is according to the standard salaries of the Austrian collective agreement, min. € 3.058,60 pre-tax (14x  per year) (Uni-KV: B1, https://www.aau.at/en/uni-kv). The expected start date of employment is April 1st, 2023.

AAU (ITEC) has been working on adaptive video streaming for more than a decade, has a proven record of successful research projects and publications in the field, and has been actively contributing to MPEG standardization for many years, including MPEG-DASH. 

The threat of climate change requires drastically reducing global greenhouse gas emissions in the next few years. We, at the Alpen-Adria-Universität Klagenfurt, research on how to reduce the energy consumption of digital technologies, particularly video streaming. Thus, we have recently started an Intelligent Climate-Friendly Video Platform project called “GAIA”. GAIA aims to research and develop energy-efficient video streaming approaches in all phases of the video delivery chain, from i) video coding, ii) video transmission, and iii) decoding of video data and video playback on end devices. For further information about the project, refer to GAIA page and Bitmovin news

Your profile:

  • Master or diploma degree of Technical Science in the field of Computer Science or Electrical Engineering, completed at a domestic or foreign university (with good final degrees);
  • We expect a high level of interest in conducting scientific research, team spirit, communication;
  • Excellent English skills, both in written and oral form.

Desirable qualifications include:

  • Excellent programming skills, especially in Python, C, and C++;
  • Knowledgeable in energy-efficient video communication, virtualization and working with Docker, video streaming (one or more of the above-identified video delivery chain areas), HTTP Adaptive Streaming, machine and deep learning, or/and clould and edge computing;
  • Relevant international and practical work experience;
  • Social and communicative competencies and ability to work in a team;
  • Experience with research activities.

The working language and the research program are in English. There is no need to learn German for this position unless the applicant wants to participate in undergraduate teaching, which is optional.

Our offer:

  • Excellent opportunities to work in a lively research environment and collaborate with international colleagues
  • Personal and professional advanced training courses, management and career coaching
  • Numerous attractive additional benefits; see also https://jobs.aau.at/en/the-university-as-employer/
  • Diversity- and family-friendly university culture
  • The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature, and sports

The application:

If you are interested in this position, please apply in German or English by providing the following documents:

  • Letter of motivation
  • Curriculum vitae 
  • Copies of degree certificates and confirmations
  • Proof of all completed higher education programs 
  • Concept of a (potential) dissertation project (one-page maximum)

The University of Klagenfurt is aware of its social responsibility even during COVID-19. This is reflected by the high proportion of fully immunized persons among students and employees. For this reason, a continued willingness to be vaccinated in connection with COVID-19 is expected upon entering university employment.

The University of Klagenfurt aims to increase the proportion of women and explicitly invites qualified women to apply for the position. Where the qualification is equivalent, women will receive preferential consideration. 

People with disabilities or chronic diseases, who fulfill the requirements, are particularly encouraged to apply. 

Travel and accommodation costs incurred during the application process will not be refunded.

Submit all relevant documents, including copies of all school certificates and performance records by email (see contact information below).

Application deadline: December 19, 2022.

Contact information:

Klagenfurt, situated at the beautiful Lake Wörthersee – one of the largest and warmest alpine lakes in Europe – has nearly 100.000 inhabitants. Being a small city, with a Renaissance-style city center reflecting 800 years of history and with Italian influence, Klagenfurt is a pleasant place to live and work. The university is located only about 1.5 kilometers east of Lake Wörthersee and about 3 kilometers west of the city enter.


Sunday, November 6, 2022

ARARAT: A Collaborative Edge-Assisted Framework for HTTP Adaptive Video Streaming

 IEEE Transactions on Network and Service Management (TNSM)

Journal Website

[PDF]

Reza Farahani (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, Austria), Mohammad Shojafar (University of Surrey, UK), Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, Austria), Farzad Tashtarian (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, Austria), Mohammad Ghanbari (University of Essex, UK), and Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: With the ever-increasing demands for high-definition and low-latency video streaming applications, network-assisted video streaming schemes have become a promising complementary solution in the HTTP Adaptive Streaming (HAS) context to improve users’ Quality of Experience (QoE) as well as network utilization. Edge computing is considered one of the leading networking paradigms for designing such systems by providing video processing and caching close to the end-users. Despite the wide usage of this technology, designing network-assisted HAS architectures that support low-latency and high-quality video streaming, including edge collaboration is still a challenge. To address these issues, this article leverages the Software-Defined Networking (SDN), Network Function Virtualization (NFV), and edge computing paradigms to propose A collaboRative edge-Assisted framewoRk for HTTP Adaptive video sTreaming (ARARAT). Aiming at minimizing HAS clients’ serving time and network cost, besides considering available resources and all possible serving actions, we design a multi-layer architecture and formulate the problem as a centralized optimization model executed by the SDN controller. However, to cope with the high time complexity of the centralized model, we introduce three heuristic approaches that produce near-optimal solutions through efficient collaboration between the SDN controller and edge servers. Finally, we implement the ARARAT framework, conduct our experiments on a large-scale cloud-based testbed including 250 HAS players, and compare its effectiveness with state-of-the-art systems within comprehensive scenarios. The experimental results illustrate that the proposed ARARAT methods (i) improve users’ QoE by at least 47%, (ii) decrease the streaming cost, including bandwidth and computational costs, by at least 47%, and (iii) enhance network utilization, by at least 48% compared to state-of-the-art approaches.

Index Terms—HTTP Adaptive Streaming (HAS), Network-Assisted Video Streaming, Software-Defined Networking (SDN), Network Function Virtualization (NFV), Edge Computing, Edge Collaboration, Video Transcoding.

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/

Tuesday, November 1, 2022

DoFP+: An HTTP/3-based Adaptive Bitrate Approach Using Retransmission Techniques

DoFP+: An HTTP/3-based Adaptive Bitrate Approach Using Retransmission Techniques
IEEE Access
[PDF]

Minh Nguyen, Daniele LorenziFarzad Tashtarian, Hermann HellwagnerChristian Timmerer
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

(*) Minh Nguyen and Daniele Lorenzi contributed equally to this work


Abstract
: HTTP Adaptive Streaming (HAS) solutions use various adaptive bitrate (ABR) algorithms to select suitable video qualities with the objective of coping with the variations of network connections. HTTP has been evolving with various versions and provides more and more features. Most of the existing ABR algorithms do not significantly benefit from the HTTP development when they are merely supported by the most recent HTTP version. An open research question is “How can new features of the recent HTTP versions be used to enhance the performance of HAS?” To address this question, in this paper, we introduce Days of Future Past+ (DoFP+ for short), a heuristic algorithm that takes advantage of the features of the latest HTTP version, HTTP/3, to provide high Quality of Experience (QoE) to the viewers. DoFP+ leverages HTTP/3 features, including (i) stream multiplexing, (ii) stream priority, and (iii) request cancellation to upgrade low-quality segments in the player buffer while downloading the next segment. The qualities of those segments are selected based on an objective function and throughput constraints. The objective function takes into account two factors, namely the (i) average bitrate and the (ii) video instability of the considered set of segments. We also examine different strategies of download order for those segments to optimize the QoE in limited resources scenarios. The experimental results show an improvement in QoE by up to 33% while the number of stalls and stall duration for DoFP+ are reduced by 86% and 92%, respectively, compared to state-of-the-art ABR schemes. In addition, DoFP+ saves on average up to 16% downloaded data across all test videos. Also, we find that downloading segments sequentially brings more benefits for retransmissions than concurrent downloads; and lower-quality segments should be upgraded before other segments to gain more QoE improvement. Our source code has been published for reproducibility at https://github.com/cd-athena/DoFP-Plus.

Keywords: HTTP/3, ABR algorithm, QoE, HAS, DASH

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/

Saturday, October 29, 2022

Perceptually-aware Per-title Encoding for Adaptive Video Streaming

 2022 IEEE International Conference on Multimedia and Expo (ICME)

July 18-22, 2022 | Taipei, Taiwan

Conference Website

[PDF][Slides][Video]

Vignesh V Menon,  Hadi Amirpour, Mohammad Ghanbariand Christian Timmerer
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: In live streaming applications, typically a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency in order to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform(DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.

 

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Friday, October 28, 2022

OPSE: Online Per-Scene Encoding for Adaptive HTTP Live Streaming

 2022 IEEE International Conference on Multimedia and Expo (ICME)
Industry & Application Track

July 18-22, 2022 | Taipei, Taiwan

Conference Website

[PDF][Slides][Video]

Vignesh V Menon,  Hadi Amirpour, Christian Feldmann (Bitmovin, Austria), Mohammad Ghanbariand Christian Timmerer
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: In live streaming applications, typically a fixed set of bitrate-resolution pairs (known as a bitrate ladder) is used during the entire streaming session in order to avoid the additional latency to find scene transitions and optimized bitrate-resolution pairs for every video content. However, an optimized bitrate ladder per scene may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces an Online Per-Scene Encoding (OPSE) scheme for adaptive HTTP live streaming applications. In this scheme, scene transitions and optimized bitrate-resolution pairs for every scene are predicted using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features. Experimental results show that, on average, OPSE yields bitrate savings of up to 48.88% in certain scenes to maintain the same VMAF, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming.

 

The bitrate ladder prediction envisioned using OPSE

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Thursday, October 27, 2022

Quality Optimization of Live Streaming Services over HTTP with Reinforcement Learning

 

IEEE Global Communications Conference 2021
7-11 December 2021 // Madrid, Spain // Hybrid: In-Person and Virtual Conference
Connecting Cultures around the Globe
https://globecom2021.ieee-globecom.org/

[PDF][Slides][Video]

F. Tashtarian*, R. Falanji‡, A. Bentaleb+, A. Erfanian*, P. S. Mashhadi§,
C. Timmerer*, H. Hellwagner*, R. Zimmermann+
Christian Doppler Laboratory ATHENA, Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Austria*
Department of Mathematical Science, Sharif University of Technology, Tehran, Iran‡
Department of Computer Science, School of Computing, National University of Singapore (NUS)+
Center for Applied Intelligent Systems Research (CAISR), Halmstad University, Sweden§

 

Abstract: Recent years have seen tremendous growth in HTTP adaptive live video traffic over the Internet. In the presence of highly dynamic network conditions and diverse request patterns, existing yet simple hand-crafted heuristic approaches for serving client requests at the network edge might incur a large overhead and significant increase in time complexity. Therefore, these approaches might fail in delivering acceptable Quality of Experience (QoE) to end users. To bridge this gap, we propose ROPL, a learning-based client request management solution at the edge that leverages the power of the recent breakthroughs in deep reinforcement learning, to serve requests of concurrent users joining various HTTP-based live video channels. ROPL is able to react quickly to any changes in the environment, performing accurate decisions to serve clients requests, which results in achieving satisfactory user QoE. We validate the efficiency of ROPL through trace-driven simulations and a real-world setup. Experimental results from real-world scenarios confirm that ROPL outperforms existing heuristic-based approaches in terms of QoE, with a factor up to 3.7×.

Index Terms—Network Edge; Request Serving; HTTP Live Streaming; Low Latency; QoE; Deep Reinforcement Learning.

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.