Monday, May 22, 2023

Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming

 2022 Picture Coding Symposium (PCS)

December 7-9, 2022 | San Jose, CA, USA

Conference Website
[PDF][Slides]

Vignesh V Menon (Alpen-Adria-Universität Klagenfurt),  Hadi Amirpour (Alpen-Adria-Universität Klagenfurt), Prajit T Rajendran (Universite Paris-Saclay, France), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt)

Abstract: In live streaming applications, a fixed set of bitrate-resolution pairs (known as bitrate ladder) is generally used to avoid additional pre-processing run-time to analyze the complexity of every video content and determine the optimized bitrate ladder. Furthermore, live encoders use the fastest available preset for encoding to ensure the minimum possible latency in streaming. For live encoders, it is expected that the encoding speed is equal to the video framerate. However, an optimized encoding preset may result in (i) increased Quality of Experience (QoE) and (ii) improved CPU utilization while encoding. In this light, this paper introduces a Content-Adaptive encoder Preset prediction Scheme (CAPS) for adaptive live video streaming applications. In this scheme, the encoder preset is determined using Discrete Cosine Transform (DCT)-energy-based low-complexity spatial and temporal features for every video segment, the number of CPU threads allocated for each encoding instance, and the target encoding speed. Experimental results show that CAPS yields an overall quality improvement of 0.83 dB PSNR and 3.81 VMAF with the same bitrate, compared to the fastest preset encoding of the HTTP Live Streaming (HLS) bitrate ladder using x265 HEVC open-source encoder. This is achieved by maintaining the desired encoding speed and reducing CPU idle time.

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

Friday, May 19, 2023

IEEE TIP: Advanced Scalability for Light Field Image Coding

Advanced Scalability for Light Field Image Coding

IEEE Transactions on Image Processing (TIP)

Journal Website

[PDF]

Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), Christine Guillemot (INRIA, France), Mohammad Ghanbari (University of Essex, UK), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

Abstract: Light field imaging, which captures both spatial and angular information, improves user immersion by enabling post-capture actions, such as refocusing and changing view perspective. However, light fields represent very large volumes of data with a lot of redundancy that coding methods try to remove. State-of-the-art coding methods indeed usually focus on improving compression efficiency and overlook other important features in light field compression such as scalability. In this paper, we propose a novel light field image compression method that enables (i) viewport scalability, (ii) quality scalability, (iii) spatial scalability, (iv) random access, and (v) uniform quality distribution among viewports, while keeping compression efficiency high. To this end, light fields in each spatial resolution are divided into sequential viewport layers, and viewports in each layer are encoded using the previously encoded viewports. In each viewport layer, the available viewports are used to synthesize intermediate viewports using a video interpolation deep learning network. The synthesized views are used as virtual reference images to enhance the quality of intermediate views. An image super-resolution method is applied to improve the quality of the lower spatial resolution layer. The super-resolved images are also used as virtual reference images to improve the quality of the higher spatial resolution layer.
The proposed structure also improves the flexibility of light field streaming, provides random access to the viewports, and increases error resiliency. The experimental results demonstrate that the proposed method achieves a high compression efficiency and it can adapt to the display type, transmission channel, network condition, processing power, and user needs.

Keywords—Light field, compression, scalability, random access, deep learning.

Acknowledgements: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

 

Tuesday, May 16, 2023

MPEG news: a report from the 142nd meeting

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will also be posted at ACM SIGMM Records.

The 142nd MPEG meeting was held face-to-face in Antalya, Türkiye. The official press release can be found here and comprises the following items:
  • MPEG issues Call for Proposals for Feature Coding for Machines
  • MPEG finalizes the 9th Edition of MPEG-2 Systems
  • MPEG reaches the First Milestone for Storage and Delivery of Haptics Data
  • MPEG completes 2nd Edition of Neural Network Coding (NNC)
  • MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video
  • MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling
The press release text has been modified to match the target audience of ACM SIGMM and highlight research aspects targeting researchers in video technologies. This column focuses on the 9th edition of MPEG-2 Systems, storage and delivery of haptics data, neural network coding (NNC), MPEG immersive video (MIV), and updates on MPEG-DASH.

Feature Coding for Video Coding for Machines (FCVCM)

At the 142nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks. This work on "Feature Coding for Video Coding for Machines (FCVCM)" aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate widespread deployment of applications utilizing such networks. Initially as part of the "Video Coding for Machines" activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of 'feature anchors' that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.

Research aspects: FCVCM is about compression, and the central research aspect here is compression efficiency which can be tested against a commonly agreed dataset (anchors). Additionally, it might be attractive to research which features are relevant for video coding for machines (VCM) and quality metrics in this emerging domain. One might wonder whether, in the future, robots or other AI systems will participate in subjective quality assessments.

9th Edition of MPEG-2 Systems

MPEG-2 Systems was first standardized in 1994, defining two container formats: program stream (e.g., used for DVDs) and transport stream. The latter, also known as MPEG-2 Transport Stream (M2TS), is used for broadcast and internet TV applications and services. MPEG-2 Systems has been awarded a Technology and Engineering Emmy® in 2013 and at the 142nd MPEG meeting, MPEG Systems (WG 3) ratified the 9th edition of ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.

Research aspects: MPEG container formats such as MPEG-2 Systems and ISO Base Media File Format are necessary for storing and delivering multimedia content but are often neglected in research. Thus, I would like to take up the cudgels on behalf of the MPEG Systems working group and argue that researchers should pay more attention on these container formats and conduct research and experiments for its efficient use with respect to multimedia storage and delivery.

Storage and Delivery of Haptics Data

At the 142nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data, has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.
Research aspects: Coding (ISO/IEC 23090-31) and carriage (ISO/IEC 23090-32) of haptics data goes hand in hand and needs further investigation concerning compression efficiency and storage/delivery performance with respect to various use cases.

Neural Network Coding (NNC)

Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.

Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes of the neural network parameters but may also involve structural changes in the neural network (e.g., when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.
The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.

Research aspects: The incremental compression of neural networks enables various new use cases, which provides research opportunities for media coding and communication, including optimization thereof.

MPEG Immersive Video

At the 142nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.
MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video.

ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual

Research aspects: Conformance and reference software are usually provided to facilitate product conformance testing, but it also provides researchers with a common platform and dataset, allowing for the reproducibility of their research efforts. Luckily, conformance and reference software are typically publicly available with an appropriate open source license.

MPEG-DASH Updates

Finally, I'd like to provide a quick update regarding MPEG-DASH, which has become a new part, namely redundant encoding and packaging for segmented live media (REAP; ISO/IEC 23009-9). The following figure provides the reference workflow for redundant encoding and packaging of live segmented media.
The reference workflow comprises (i) Ingest Media Presentation Description (I-MPD), (ii) Distribution Media Presentation Description (D-MPD), and (iii) Storage Media Presentation Description (S-MPD), among others; each defining constraints on the MPD and tracks of ISO base media file format (ISOBMFF).

Additionally, the MPEG-DASH Break out Group discussed various technologies under consideration, such as (a) combining HTTP GET requests, (b) signaling common media client data (CMCD) and common media server data (CMSD) in a MPEG-DASH MPD, (c) image and video overlays in DASH, and (d) updates on lower latency.

An updated overview of DASH standards/features can be found in the Figure below.

Research aspects: The REAP committee draft (CD) is publicly available feedback from academia and industry is appreciated. In particular, first performance evaluations or/and reports from proof of concept implementations/deployments would be insightful for the next steps in the standardization of REAP.

The 143rd MPEG meeting will be held in Geneva from July 17-21, 2023. Click here for more information about MPEG meetings and their developments.

    Tuesday, April 11, 2023

    Hybrid P2P-CDN Architecture for Live Video Streaming: An Online Learning Approach

    IEEE Global Communications Conference (GLOBECOM)

    December 4-8, 2022 |Rio de Janeiro, Brazil

    Conference Website

    [PDF][Slides]

    Reza Farahani (Alpen-Adria-Universität Klagenfurt, Austria), Abdelhak Bentaleb (National University of Singapore, Singapore), Ekrem Cetinkaya (Alpen-Adria-Universität Klagenfurt, Austria), Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria), Roger Zimmermann (National University of Singapore, Singapore), and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt, Austria)

    Abstract: a cost-effective, scalable, and flexible architecture that supports low latency and high-quality live video streaming is still a challenge for Over-The-Top (OTT) service providers. To cope with this issue, this paper leverages Peer-to-Peer (P2P), Content Delivery Network (CDN), edge computing, Network Function Virtualization (NFV), and distributed video transcoding paradigms to introduce a hybRId P2P-CDN arcHiTecture for livE video stReaming (RICHTER). We first introduce RICHTER’s multi-layer architecture and design an action tree that considers all feasible resources provided by peers, edge, and CDN servers for serving peer requests with minimum latency and maximum quality. We then formulate the problem as an optimization model executed at the edge of the network. We present an Online Learning (OL) approach that leverages an unsupervised Self Organizing Map (SOM) to (i) alleviate the time complexity issue of the optimization model and (ii) make it a suitable solution for large-scale scenarios by enabling decisions for groups of requests instead of for single requests. Finally, we implement the RICHTER framework, conduct our experiments on a large-scale cloud-based testbed including 350 HAS players, and compare its effectiveness with baseline systems. The experimental results illustrate that RICHTER outperforms baseline schemes in terms of users’ Quality of Experience (QoE), latency, and network utilization, by at least 59%, 39%, and 70%, respectively.

    Index Terms—HAS; Edge Computing; NFV; CDN; P2P; Low Latency; QoE; Video Transcoding; Online Learning.

    Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

    Friday, April 7, 2023

    Elsevier Signal Processing: Reversible Data Hiding for Color Images Based on Pixel Value Order of Overall Process Channel Correlation

    Reversible Data Hiding for Color Images Based on Pixel Value Order of Overall Process Channel Correlation

    Elsevier Signal Processing

    [pdf]

    Journal Website

    Ningxiong Mao (Southwest Jiaotong University), Hongjie Hea (Southwest Jiaotong University), Fan Chenb (Southwest Jiaotong University), Lingfeng Qu (Southwest Jiaotong University), Hadi Amirpour (Alpen-Adria-Universität Klagenfurt, Austria), and Christian Timmerer (Alpen-Adria-Universität Klagenfurt, Austria)

    Abstract:

    Color image Reversible Data Hiding (RDH) is getting more and more important since the number of its applications is steadily growing. This paper proposes an efficient color image RDH scheme based on pixel value ordering (PVO), in which the channel correlation is fully utilized to improve the embedding performance. In the proposed method, the channel correlation is used in the overall process of data embedding, including prediction stage, block selection and capacity allocation. In the prediction stage, since the pixel values in the co-located blocks in different channels are monotonically consistent, the large pixel values are collected preferentially by pre-sorting the intra-block pixels. This can effectively improve the embedding capacity of RDH based on PVO. In the block selection stage, the description accuracy of block complexity value is improved by exploiting the texture similarity between the channels. The smoothing the block is then preferentially used to reduce invalid shifts. To achieve low complexity and high accuracy in capacity allocation, the proportion of the expanded prediction error to the total expanded prediction error in each channel is calculated during the capacity allocation process. The experimental results show that the proposed scheme achieves significant superiority in fidelity over a series of state-of-the-art schemes. For example, the PSNR of the Lena image reaches 62.43dB, which is a 0.16dB gain compared to the best results in the literature with a 20,000bits embedding capacity.

    KeywordsReversible data hiding, color image, pixel value ordering, channel correlation

    Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.

     

    Thursday, April 6, 2023

    VCD: Video Complexity Dataset

    The 13th ACM Multimedia Systems Conference (ACM MMSys 2022) Open Dataset and Software (ODS) track

    June 14–17, 2022 |  Athlone, Ireland

    Conference Website
    [PDF]

    Hadi Amirpour, Vignesh V Menon, Samira Afzal, Mohammad Ghanbari, and Christian Timmerer
    Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität Klagenfurt

    Abstract: This paper provides an overview of the open Video Complexity Dataset (VCD) which comprises 500 Ultra High Definition (UHD) resolution test video sequences. These sequences are provided at 24 frames per second (fps) and stored online in losslessly encoded 8-bit 4:2:0 format. In this paper, all sequences are characterized by spatial and temporal complexities, rate-distortion complexity, and encoding complexity with the x264 AVC/H.264 and x265 HEVC/H.265 video encoders. The dataset is tailor-made for cutting-edge multimedia applications such as video streaming, two-pass encoding, per-title encoding, scene-cut detection, etc. Evaluations show that the dataset includes diversity in video complexities. Hence, using this dataset is recommended for training and testing video coding applications. All data have been made publicly available as part of the dataset, which can be used for various applications.

    The details of VCD can be accessed online at https://vcd.itec.aau.at.

    Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.


    Wednesday, April 5, 2023

    LEADER: A Collaborative Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming

     LEADER: A Collaborative Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming

    IEEE International Conference on Communications (ICC)

    May 16–20, 2022 | Seoul, South Korea

    Conference Website
    [PDF][Slides][Video]

    Reza Farahani (Alpen-Adria-Universität Klagenfurt),  Farzad Tashtarian (Alpen-Adria-Universität Klagenfurt), Christian Timmerer (Alpen-Adria-Universität Klagenfurt), Mohammad Ghanbari (School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK), and Hermann Hellwagner (Alpen-Adria-Universität Klagenfurt).

     

    Abstract: With the emerging demands of high-definition and low-latency video streams, HTTP Adaptive Streaming (HAS) is considered the principal video delivery technology over the Internet. Network-assisted video streaming schemes, which employ modern networking paradigms, e.g., Software-Defined Networking (SDN), Network Function Virtualization (NFV), and edge computing, have been introduced as promising complementary solutions in the HAS context to improve users' Quality of Experience (QoE) as well as network utilization. However, the existing network-assisted HAS schemes have not fully used edge collaboration techniques and SDN capabilities for achieving the aforementioned aims. To bridge this gap, this paper introduces a coLlaborative Edge- and SDN-Assisted framework for HTTP aDaptive vidEo stReaming (LEADER). In LEADER, the SDN controller collects various information items and runs a central optimization model that minimizes the HAS clients' serving time, subject to the network's and edge servers' resource constraints. Due to the NP-completeness and impractical overheads of the central optimization model, we propose an online distributed lightweight heuristic approach consisting of two phases that runs over the SDN controller and edge servers, respectively. We implement the proposed framework, conduct our experiments on a large-scale testbed including 250 HAS players, and compare its effectiveness with other strategies. The experimental results demonstrate that LEADER outperforms baseline schemes in terms of both users' QoE and network utilization, by at least 22% and 13%, respectively.

    Keywords:

    Dynamic Adaptive Streaming over HTTP (DASH), Network-Assisted Video Streaming, Video Transcoding, Quality of Experience (QoE), Software-Defined Networking (SDN), Network Function Virtualization (NFV), Edge Computing, Edge Collaboration.

    Acknowledgments: The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development, and the Christian Doppler Research Association, is gratefully acknowledged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.