Monday, March 27, 2023

ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming

 2022 IEEE International Conference on Image Processing (ICIP)

October 16-19, 2022 | Bordeaux, France

Conference Website

[PDF][Slides][Video]

Vignesh V Menon,  Hadi Amirpour, Mohammad Ghanbariand Christian Timmerer
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: In two-pass encoding, also known as multi-pass encoding, the input video content is analyzed in the first-pass to help the second-pass encoding utilize better encoding decisions and improve overall compression efficiency. In live streaming applications, a single-pass encoding scheme is mainly used to avoid the additional first-pass encoding run-time to analyze the complexity of every video content. This paper introduces an Efficient low-latency Two-Pass encoding Scheme (ETPS) for live video streaming applications. In this scheme, Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment are extracted in the first-pass to predict each target bitrate’s optimal constant rate factor (CRF) for the second-pass constrained variable bitrate (cVBR) encoding. Experimental results show that, on average, ETPS compared to a traditional two-pass average bitrate encoding scheme yields encoding time savings of 43.78% without any noticeable drop in compression efficiency. Additionally, compared to a single-pass constant bitrate (CBR) encoding, it yields bitrate savings of 10.89% and 8.60% to maintain the same PSNR and VMAF, respectively.

 

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Friday, March 24, 2023

Light-weight Video Encoding Complexity Prediction using Spatio Temporal Features

2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP)

September 26-28, 2022 | Shanghai, China

Conference Website

[PDF]

Hadi Amirpour*, Prajit T Rajendran (Universite Paris-Saclay, Paris, France), Vignesh V Menon*, Mohammad Ghanbari*and Christian Timmerer*
* ... Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: The increasing demand for high-quality and low-cost video streaming services calls for predicting video encoding complexity. The prior prediction of video encoding complexity including encoding time and bitrate predictions are used to allocate resources and set optimized parameters for video encoding effectively. In this paper, a light-weight video encoding complexity prediction (VECP) scheme that predicts the encoding bitrate and the encoding time of video with high accuracy is proposed. Firstly, low-complexity Discrete Cosine Transform (DCT)-energy-based features, namely spatial complexity, temporal complexity, and brightness of videos are extracted, which can efficiently represent the encoding complexity of videos. The latent vectors are also extracted from a Convolutional Neural Network (CNN) with MobileNet as the backend to obtain additional features from representative frames of each video to assist the prediction process. The extreme gradient boosting (XGBoost) regression algorithm is deployed to predict video encoding complexity using the extracted features. The experimental results demonstrate that VECP predicts the encoding bitrate with an error percentage of up to 3.47% and encoding time with an error percentage of up to 2.89%, but with a significantly low overall latency of 3.5 milliseconds per frame which makes it suitable for both Video on Demand (VoD) and live streaming applications.

 

Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Saturday, March 4, 2023

CADLAD: Device-aware Bitrate Ladder Construction for HTTP Adaptive Streaming

 18th International Conference on Network and Service Management (CNSM 2022)

Thessaloniki, Greece | 31 October - 4 November 2022

Conference Website

[PDF][Slides]

Minh Nguyen (Alpen-Adria-Universität Klagenfurt, Austria), Babak Taraghi (Alpen-Adria-Universität Klagenfurt, Austria), Abdelhak Bentaleb (National University of Singapore, Singapore), Roger Zimmermann (National University of Singapore, Singapore), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: Considering network conditions, video content, and viewer device type/screen resolution to construct a bitrate ladder is necessary to deliver the best Quality of Experience (QoE). A large-screen device like a TV needs a high bitrate with high resolution to provide good visual quality, whereas a small one like a phone requires a low bitrate with low resolution. In addition, encoding high-quality levels at the server side while the network is unable to deliver them causes unnecessary cost for the content provider. Recently, the Common Media Client Data (CMCD) standard has been proposed, which defines the data that is collected at the client and sent to the server with its HTTP requests. This data is useful in log analysis, quality of service/experience monitoring and delivery improvements.

In this paper, we introduce a CMCD-Aware per-Device bitrate LADder construction (CADLAD) that leverages CMCD to address the above issues. CADLAD comprises components at both client and server sides. The client calculates the top bitrate (tb) — a CMCD parameter to indicate the highest bitrate that can be rendered at the client — and sends it to the server together with its device type and screen resolution. The server decides on a suitable bitrate ladder, whose maximum bitrate and resolution are based on CMCD parameters, to the client device with the purpose of providing maximum QoE while minimizing delivered data. CADLAD has two versions to work in Video on
Demand (VoD) and live streaming scenarios. Our CADLAD is client agnostic; hence, it can work with any players and ABR algorithms at the client. The experimental results show that CADLAD is able to increase the QoE by 2.6x while saving 71% of delivered data, compared to an existing bitrate ladder of an available video dataset. We implement our idea within CAdViSE — an open-source testbed for reproducibility.

cadlad

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

 

Friday, March 3, 2023

Low Latency Live Streaming Implementation in DASH and HLS

ACM Multimedia Conference - OSS Track

Lisbon, Portugal | 10-14 October 2022

[PDF]

Abdelhak Bentaleb (National University of Singapore), Zhengdao Zhan (National University of Singapore), Farzad Tashtarian (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), May Lim (National University of Singapore), Saad Harous (University of Sharjah), Christian Timmerer (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), Hermann Hellwagner (Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt), and Roger Zimmermann (National University of Singapore)

Abstract: Low latency live streaming over HTTP using Dynamic Adaptive Streaming over HTTP (LL-DASH) and HTTP Live Streaming} (LL-HLS) has emerged as a new way to deliver live content with respectable video quality and short end-to-end latency. Satisfying these requirements while maintaining viewer experience in practice is challenging, and adopting conventional adaptive bitrate (ABR) schemes directly to do so will not work. Therefore, recent solutions including LoL+, L2A, Stallion, and Llama re-think conventional ABR schemes to support low-latency scenarios. These solutions have been integrated with dash.js  that support LL-DASH. However, their performance in LL-HLS remains in question. To bridge this gap, we implement and integrate existing LL-DASH ABR schemes in the hls.js video player which supports LL-HLS. Moreover, a series of real-world trace-driven experiments have been conducted to check their efficiency under various network conditions including a comparison with results achieved for LL-DASH in dash.js.

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.
 

Thursday, March 2, 2023

LFC-SASR: Light Field Coding Using Spatial and Angular Super-Resolution

 ICME Workshop on Hyper-Realistic Multimedia for Enhanced Quality of Experience (ICMEW)

July 18-22, 2022 | Taipei, Taiwan

Conference Website

[PDF]

Ekrem Çetinkaya, Hadi Amirpour, and Christian Timmerer
Christian Doppler LaboratoryATHENA, Alpen-Adria-Universität Klagenfurt

Abstract: Light field imaging enables post-capture actions such as refocusing and changing view perspective by capturing both spatial and angular information. However, capturing richer information about the 3D scene results in a huge amount of data. To improve the compression efficiency of the existing light field compression methods, we investigate the impact of light field super-resolution approaches (both spatial and angular super-resolution) on the compression efficiency. To this end, firstly, we downscale light field images over (i) spatial resolution, (ii) angular resolution, and (iii) spatial-angular resolution and encode them using Versatile Video Coding (VVC). We then apply a set of light field super-resolution deep neural networks to reconstruct light field images in their full spatial-angular resolution and compare their compression efficiency. Experimental results show that encoding the low angular resolution light field image and applying angular super-resolution yield bitrate savings of 51.16 % and 53.41 % to maintain the same PSNR and SSIM, respectively, compared to encoding the light field image in high resolution.

Keywords: Light field, Compression, Super-resolution, VVC.

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Saturday, January 14, 2023

MPEG news: a report from the 140th meeting

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.


After several years of online meetings, the 140th MPEG meeting was held as a face-to-face meeting in Mainz, Germany, and the official press release can be found here and comprises the following items:
  • MPEG evaluates the Call for Proposals on Video Coding for Machines
  • MPEG evaluates Call for Evidence on Video Coding for Machines Feature Coding
  • MPEG reaches the First Milestone for Haptics Coding
  • MPEG completes a New Standard for Video Decoding Interface for Immersive Media
  • MPEG completes Development of Conformance and Reference Software for Compression of Neural Networks
  • MPEG White Papers: (i) MPEG-H 3D Audio, (ii) MPEG-I Scene Description

Video Coding for Machines

Video coding is the process of compression and decompression of digital video content with the primary purpose of consumption by humans (e.g., watching a movie or video telephony). Recently, however, massive video data is more and more analyzed without human intervention leading to a new paradigm referred to as Video Coding for Machines (VCM) which targets both (i) conventional video coding and (ii) feature coding [1].

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by various proposals.

Furthermore, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Evidence (CfE) for technologies and solutions enabling efficient feature coding for machine vision tasks. A total of eight responses to this CfE were received, of which six responses were considered valid based on the conditions described in the call:
  • For the tested video dataset increases in compression efficiency of up to 87% compared to the video anchor and over 90% compared to the feature anchor were reported.
  • For the tested image dataset, the compression efficiency can be increased by over 90% compared to both image and feature anchors.
Research aspects: the main research area is still the same as described in my last blog post, i.e., compression efficiency (incl. probably runtime, sometimes called complexity) and Quality of Experience (QoE). Additional research aspects are related to the actual task for which video coding for machines is used (e.g., segmentation, object detection, as mentioned above).

Video Decoding Interface for Immersive Media

One of the most distinctive features of immersive media compared to 2D media is that only a tiny portion of the content is presented to the user. Such a portion is interactively selected at the time of consumption. For example, a user may not see the same point cloud object’s front and back sides simultaneously. Thus, for efficiency reasons and depending on the users’ viewpoint, only the front or back sides need to be delivered, decoded, and presented. Similarly, parts of the scene behind the observer may not need to be accessed.

At the 140th MPEG meeting, MPEG Systems (WG 3) reached the final milestone of the Video Decoding Interface for Immersive Media (VDI) standard (ISO/IEC 23090-13) by promoting the text to Final Draft International Standard (FDIS). The standard defines the basic framework and specific implementation of this framework for various video coding standards, including support for application programming interface (API) standards that are widely used in practice, e.g., Vulkan by Khronos.

The VDI standard allows for dynamic adaptation of video bitstreams to provide the decoded output pictures so that the number of actual video decoders can be smaller than the number of elementary video streams to be decoded. In other cases, virtual instances of video decoders can be associated with the portions of elementary streams required to be decoded. With this standard, the resource requirements of a platform running multiple virtual video decoder instances can be further optimized by considering the specific decoded video regions to be presented to the users rather than considering only the number of video elementary streams in use. The first edition of the VDI standard includes support for the following video coding standards: High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), and Essential Video Coding (EVC).

Research aspect: VDI is also a promising standard to enable the implementation of viewport adaptive tile-based 360-degree video streaming, but its performance still needs to be assessed in various scenarios. However, requesting and decoding individual tiles within a 360-degree video streaming application is a prerequisite for enabling efficiency in such cases, and VDI provides the basis for its implementation.

MPEG-DASH Updates

Finally, I'd like to provide a quick update regarding MPEG-DASH, which seems to be in maintenance mode. As mentioned in my last blog post, amendments, Defects under Investigation (DuI), and Technologies under Consideration (TuC) are output documents, as well as a new working draft called Redundant encoding and packaging for segmented live media (REAP), which eventually will become ISO/IEC 23009-9. The scope of REAP is to define media formats for redundant encoding and packaging of live segmented media, media ingest, and asset storage. The current working draft can be downloaded here.

Research aspects: REAP defines a distributed system and, thus, all research aspects related to such systems apply here, e.g., performance and scalability, just to name a few.

The 141st MPEG meeting will be online from January 16-20, 2023. Click here for more information about MPEG meetings and their developments.

Thursday, November 17, 2022

Doctoral Student Positions in Intelligent Climate-Friendly Video Platform project called “GAIA”


The Institute of Information Technology (ITEC) at the Alpen-Adria-Universität Klagenfurt (AAU) invites applications for:

Doctoral Student Position (100% employment; all genders welcome)
within the Intelligent Climate-Friendly Video Platform project called “GAIA”

at the Faculty of Technical Sciences. The monthly salary for these positions is according to the standard salaries of the Austrian collective agreement, min. € 3.058,60 pre-tax (14x  per year) (Uni-KV: B1, https://www.aau.at/en/uni-kv). The expected start date of employment is April 1st, 2023.

AAU (ITEC) has been working on adaptive video streaming for more than a decade, has a proven record of successful research projects and publications in the field, and has been actively contributing to MPEG standardization for many years, including MPEG-DASH. 

The threat of climate change requires drastically reducing global greenhouse gas emissions in the next few years. We, at the Alpen-Adria-Universität Klagenfurt, research on how to reduce the energy consumption of digital technologies, particularly video streaming. Thus, we have recently started an Intelligent Climate-Friendly Video Platform project called “GAIA”. GAIA aims to research and develop energy-efficient video streaming approaches in all phases of the video delivery chain, from i) video coding, ii) video transmission, and iii) decoding of video data and video playback on end devices. For further information about the project, refer to GAIA page and Bitmovin news

Your profile:

  • Master or diploma degree of Technical Science in the field of Computer Science or Electrical Engineering, completed at a domestic or foreign university (with good final degrees);
  • We expect a high level of interest in conducting scientific research, team spirit, communication;
  • Excellent English skills, both in written and oral form.

Desirable qualifications include:

  • Excellent programming skills, especially in Python, C, and C++;
  • Knowledgeable in energy-efficient video communication, virtualization and working with Docker, video streaming (one or more of the above-identified video delivery chain areas), HTTP Adaptive Streaming, machine and deep learning, or/and clould and edge computing;
  • Relevant international and practical work experience;
  • Social and communicative competencies and ability to work in a team;
  • Experience with research activities.

The working language and the research program are in English. There is no need to learn German for this position unless the applicant wants to participate in undergraduate teaching, which is optional.

Our offer:

  • Excellent opportunities to work in a lively research environment and collaborate with international colleagues
  • Personal and professional advanced training courses, management and career coaching
  • Numerous attractive additional benefits; see also https://jobs.aau.at/en/the-university-as-employer/
  • Diversity- and family-friendly university culture
  • The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature, and sports

The application:

If you are interested in this position, please apply in German or English by providing the following documents:

  • Letter of motivation
  • Curriculum vitae 
  • Copies of degree certificates and confirmations
  • Proof of all completed higher education programs 
  • Concept of a (potential) dissertation project (one-page maximum)

The University of Klagenfurt is aware of its social responsibility even during COVID-19. This is reflected by the high proportion of fully immunized persons among students and employees. For this reason, a continued willingness to be vaccinated in connection with COVID-19 is expected upon entering university employment.

The University of Klagenfurt aims to increase the proportion of women and explicitly invites qualified women to apply for the position. Where the qualification is equivalent, women will receive preferential consideration. 

People with disabilities or chronic diseases, who fulfill the requirements, are particularly encouraged to apply. 

Travel and accommodation costs incurred during the application process will not be refunded.

Submit all relevant documents, including copies of all school certificates and performance records by email (see contact information below).

Application deadline: December 19, 2022.

Contact information:

Klagenfurt, situated at the beautiful Lake Wörthersee – one of the largest and warmest alpine lakes in Europe – has nearly 100.000 inhabitants. Being a small city, with a Renaissance-style city center reflecting 800 years of history and with Italian influence, Klagenfurt is a pleasant place to live and work. The university is located only about 1.5 kilometers east of Lake Wörthersee and about 3 kilometers west of the city enter.