Tuesday, June 11, 2019

Video Developer Survey 2019: Call for Participation

After the success of 2018 and 2017 editions of the Video Developer Survey, Bitmovin is calling once again for your participation in this years survey, which is accessible here:

The goal is to learn from you about the video codecs, formats and platforms used in your organization and how you see the tech evolving in the next year. 

An example outcome from last years edition is shown below, which illustrates the planned video codec usage in the next 12 months compared to the 2017 report.
Planned video codec usage in the next 12 months 2017 vs. 2018.

Tuesday, June 4, 2019

QUALINET Online Lecture: On The Privacy Preserving Modelling for QoE

Presenters: Selim Ickin and Jörgen Gustafsson, Ericsson Research
When: 21 June, 2019 - 11:00-12:00 CEST

Machine Learning models in the area of QoE potentially suffer from over-fitting due to limitations including low data volume, and participant profile. This might prevent models being generic if the QoE ML problem is not well formulated, hence these trained models might have risk of performing unexpectedly when tested outside the experimented population. One reason for the limited datasets, which is referred as QoE data lakes, is due to the fact that often these datasets potentially contain user sensitive information, and are only collected throughout expensive user studies with special user consent. Thus, sharing of datasets amongst researchers has been challenging. In this talk, we will discuss on a few state of the art privacy preserving machine learning training techniques that potentially enables sharing of learned knowledge amongst different small data lakes.

Tuesday, April 30, 2019

MPEG news: a report from the 126th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

MPEG News Archive

The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:
  • Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
  • Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
  • Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
  • Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
  • MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
  • MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies
The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126

Three Degrees of Freedom Plus (3DoF+)

MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video

MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.

At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.

Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about "Coding Technologies for Immersive Audio/Visual Experiences". Further details are available here.

Neural Network Compression for Multimedia Applications

MPEG evaluates responses to the Call for Proposal and kicks off its technical work

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.

At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.

After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).

Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.

Low Complexity Enhancement Video Coding

MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development

MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.

The target is to achieve:
  • coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
  • coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.
This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.

Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity -- both at encoder and decoder -- is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.

Point Cloud Compression

MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage

MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.

MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.

G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.

MPEG Media Transport (MMT)

MPEG approves 3rd Edition of Final Draft International Standard

MMT 3rd edition will introduce two aspects:
  • enhancements for mobile environments and
  • support of Contents Delivery Networks (CDNs).
The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.

Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.

On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.

Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.

Let me finish this blog post with...

DASH, what else?

MPEG is working towards 4th edition of ISO/IEC 23009-1 MPEG Dynamic Adaptive Streaming over HTTP (DASH) but I guess it won't have such a huge set of new tools added to the standard (as it was the case from 2nd to 3rd edition). However, no public information available so far. Other resources relevant to DASH can be found on DASH-IF web site (guidelines, dash.js), Mile High Video 2019, ACM MMSys 2019, etc.

Thursday, April 4, 2019

ACM NOSSDAV'19: Bandwidth Prediction in Low-Latency Chunked Streaming

Bandwidth Prediction in Low-Latency Chunked Streaming


Abdelhak Bentaleb (National University of Singapore), Christian Timmerer (Alpen-Adria Universität & Bitmovin Inc,), Ali C. Begen (Ozyegin University), and Roger Zimmermann (National University of Singapore)

Abstract: HTTP adaptive streaming (HAS) with chunked transfer encoding can be used to reduce latency without sacrificing the coding ef- ficiency. While this allows a media segment to be generated and delivered at the same time, it also causes grossly inaccurate band- width measurements, leading to incorrect bitrate selections. To overcome this effect, we design a novel Adaptive bitrate scheme for Chunked Transfer Encoding (ACTE) that leverages the unique nature of chunk downloads. It uses a sliding window to accurately measure the available bandwidth and an online linear adaptive filter to predict the available bandwidth into the future. Results show that ACTE achieves 96% measurement accuracy, which translates to a 64% reduction in stalls and a 27% increase in video quality.

Keywords: HAS; ABR; DASH; CMAF; low-latency; HTTP chunked transfer encoding; bandwidth measurement and prediction; RLS.

Acknowledgment: This research has been supported in part by the Singapore Ministry of Education Academic Research Fund Tier 1 under MOE's official grant number T1 251RES1820 and the Austrian Research Promotion Agency (FFG) under the Next Generation Video Streaming project "PROMETHEUS".

Tuesday, March 12, 2019

QoMEX'19: Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective QoE Evaluation

Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective QoE Evaluation


Raimund Schatz (AIT Austrian Institute of Technology)Anatoliy Zabrovskiy (Alpen-Adria Universität Klagenfurt)Christian Timmerer (Alpen-Adria Universität Klagenfurt, Bitmovin Inc.)

Abstract: Omnidirectional video (ODV) streaming applications are becoming increasingly popular. They enable a highly immersive experience as the user can freely choose her/his field of view within the 360-degree environment. Current deployments are fairly simple but viewport-agnostic which inevitably results in high storage/bandwidth requirements and low Quality of Experience (QoE). A promising solution is referred to as tile- based streaming which allows to have higher quality within the user’s viewport while quality outside the user’s viewport could be lower. However, empirical QoE assessment studies in this domain are still rare. Thus, this paper investigates the impact of different tile-based streaming approaches and configurations on the QoE of ODV. We present the results of a lab-based subjective evaluation in which participants evaluated 8K omnidirectional video QoE as influenced by different (i) tile-based streaming approaches (full vs. partial delivery), (ii) content types (static vs. moving camera), and (iii) tile encoding quality levels determined by different quantization parameters. Our experimental setup is characterized by high reproducibility since relevant media delivery aspects (including the user’s head movements and dynamic tile quality adaptation) are already rendered into the respective processed video sequences. Additionally, we performed a complementary objective evaluation of the different test sequences focusing on bandwidth efficiency and objective quality metrics. The results are presented in this paper and discussed in detail which confirm that tile-based streaming of ODV improves visual quality while reducing bandwidth requirements.

Index Terms: Omnidirectional Video, Tile-based Streaming, Subjective Testing, Objective Metrics, Quality of Experience

Acknowledgment: This work was supported in part by the Austrian Research Promotion Agency (FFG) under the Next Generation Video Streaming project "PROMETHEUS".

Thursday, February 21, 2019

Mobile data traffic report and forecast 2017-2022

On Feb 18, 2019 both Sandvine and Cisco released their mobile data traffic report and forecast 2018-2022 respectively.

Starting with the Sandvine 2019 Mobile Internet Phenomena Report which features mobile data of global (except significant portions of China and India) traffic share for applications with respect to downstream, upstream, and connections.

The main message is "YouTube is the global leader with over 35% of worldwide mobile traffic, dwarfing Netflix’s 15% share in the Global Report." Looking at the global application traffic share for downstream we have YouTube (37.04%), Facebook video (2.53%) and Netflix (2.44%); in total around 42% is video (compared to almost 58% in the global report from October 2018). The top applications are shown in the figure below.
Source: Sandvine 2019 Mobile Internet Phenomena Report, Feb 18, 2019.
This report also highlights QoE and packet loss which basically focuses on throughput, latency, and packet loss that then leads to a so-called ScoreCard as shown below. However, streaming video here is not considered as a delay-sensitive application which might be true for video on demand but could look different for live service, specifically with respect to the start-up delay and delay compared to traditional TV services.
Source: Sandvine 2019 Mobile Internet Phenomena Report, Feb 18, 2019.
Cisco's Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2017–2022 measured that mobile video traffic accounted for 59% of total mobile data traffic in 2017, which means more than half of all mobile data traffic, similarly to almost 58% in the Sandvine global report from October 2018. However, please note the 42% in the Sandvine mobile report although Sandvine didn't have a specific number for video only.

Interestingly, Cisco predicts that nearly 79% of the world’s mobile data traffic will be video by 2022, it will increase 9-fold between 2017 and 2022. Furthermore, mobile video will grow at a CAGR of 55% between 2017 and 2022 which is higher than the overall average mobile traffic CAGR of 46%. Of the 77 exabytes per month crossing the mobile network by 2022, nearly 61 exabytes will be due to video (see figure below).
Source: Cisco VNI Mobile, Feb 18, 2019.
In any case, these reports confirm that video is already responsible for the majority of data traffic worldwide for both mobile and fixed-network access; and it will continue to grow...

Monday, February 18, 2019

MPEG news: a report from the 125th meeting, Marrakesh, Morocco

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

The 125th MPEG meeting concluded on January 18, 2019 in Marrakesh, Morocco with the following topics:
  • Network-Based Media Processing (NBMP) – MPEG promotes NBMP to Committee Draft stage
  • 3DoF+ Visual – MPEG issues Call for Proposals on Immersive 3DoF+ Video Coding Technology
  • MPEG-5 Essential Video Coding (EVC) – MPEG starts work on MPEG-5 Essential Video Coding
  • ISOBMFF – MPEG issues Final Draft International Standard of Conformance and Reference software for formats based on the ISO Base Media File Format (ISOBMFF)
  • MPEG-21 User Description – MPEG finalizes 2nd edition of the MPEG-21 User Description
The corresponding press release of the 125th MPEG meeting can be found here. In this blog post I’d like to focus on those topics potentially relevant for over-the-top (OTT), namely NBMP, EVC, and ISOBMFF.

Network-Based Media Processing (NBMP)

The NBMP standard addresses the increasing complexity and sophistication of media services, specifically as the incurred media processing requires offloading complex media processing operations to the cloud/network to keep receiver hardware simple and power consumption low. Therefore, NBMP standard provides a standardized framework that allows content and service providers to describe, deploy, and control media processing for their content in the cloud. It comes with two main functions: (i) an abstraction layer to be deployed on top of existing cloud platforms (+ support for 5G core and edge computing) and (ii) a workflow manager to enable composition of multiple media processing tasks (i.e., process incoming media and metadata from a media source and produce processed media streams and metadata that are ready for distribution to a media sink). The NBMP standard now reached Committee Draft (CD) stage and final milestone is targeted for early 2020.

In particular, a standard like NBMP might become handy in the context of 5G in combination with mobile edge computing (MEC) which allows offloading certain tasks to a cloud environment in close proximity to the end user. For OTT, this could enable lower latency and more content being personalized towards the user’s context conditions and needs, hopefully leading to a better quality and user experience.

For further research aspects please see one of my previous posts

MPEG-5 Essential Video Coding (EVC)

MPEG-5 EVC clearly targets the high demand for efficient and cost-effective video coding technologies. Therefore, MPEG commenced work on such a new video coding standard that should have two profiles: (i) royalty-free baseline profile and (ii) main profile, which adds a small number of additional tools, each of which is capable, on an individual basis, of being either cleanly switched off or else switched over to the corresponding baseline tool. Timely publication of licensing terms (if any) is obviously very important for the success of such a standard.

The target coding efficiency for responses to the call for proposals was to be at least as efficient as HEVC. This target was exceeded by approximately 24% and the development of the MPEG-5 EVC standard is expected to be completed in 2020.

As of today, there’s the need to support AVC, HEVC, VP9, and AV1; soon VVC will become important. In other words, we already have a multi-codec environment to support and one might argue one more codec is probably not a big issue. The main benefit of EVC will be a royalty-free baseline profile but with AV1 there’s already such a codec available and it will be interesting to see how the royalty-free baseline profile of EVC compares to AV1.

For a new video coding format we will witness a plethora of evaluations and comparisons with existing formats (i.e., AVC, HEVC, VP9, AV1, VVC). These evaluations will be mainly based on objective metrics such as PSNR, SSIM, and VMAF. It will be also interesting to see subjective evaluations, specifically targeting OTT use cases (e.g., live and on demand).

ISO Base Media File Format (ISOBMFF)

The ISOBMFF (ISO/IEC 14496-12) is used as basis for many file (e.g., MP4) and streaming formats (e.g., DASH, CMAF) and as such received widespread adoption in both industry and academia. An overview of ISOBMFF is available here. The reference software is now available on GitHub and a plethora of conformance files are available here. In this context, the open source project GPAC is probably the most interesting aspect from a research point of view.