Thursday, April 27, 2017

ACM MMSys'17 special session paper accepted: "Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation"


Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP
Design, Implementation, and Evaluation 

Mario Graf (Bitmovin), Christian Timmerer (AAU/Bitmovin), and Christopher Mueller (Bitmovin)


Abstract: Real-time entertainment services such as streaming audio- visual content deployed over the open, unmanaged Internet account now for more than 70% during peak periods. More and more such bandwidth hungry applications and services are proposed like immersive media services such as virtual reality and, specifically omnidirectional/360-degree videos. The adaptive streaming of omnidirectional video over HTTP imposes an important challenge on today’s video delivery infrastructures which calls for dedicated, thoroughly designed techniques for content generation, delivery, and consumption.

This paper describes the usage of tiles — as specified within modern video codecs such HEVC/H.265 and VP9 — enabling bandwidth efficient adaptive streaming of omnidirectional video over HTTP and we define various streaming strategies. Therefore, the parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR. The results indicate bitrate savings from 40% (in a realistic scenario with recorded head movements from real users) up to 65% (in an ideal scenario with a centered/fixed viewport) and serve as a baseline and guidelines for advanced techniques including the outline of a research roadmap for the near future.

ACM MMSys 2017http://mmsys17.iis.sinica.edu.tw/

Please feel free to contact me for details and/or I'd be happy to meet you at ACM MMSys'17.

ACM MMSys'17 demo paper accepted: "AdViSE: Adaptive Video Streaming Evaluation Framework for the Automated Testing of Media Players"

AdViSE: Adaptive Video Streaming Evaluation Framework for the Automated Testing of Media Players

Anatoliy Zabrovskiy (Petrozavodsk State University), Evgeny Kuzmin (Petrozavodsk State University), Evgeny Petrov (Petrozavodsk State University), Christian Timmerer (AAU/Bitmovin, and Christopher Mueller (Bitmovin)

Abstract: Today we can observe a plethora of adaptive video streaming services and media players which support interoperable formats like DASH and HLS. Most of the players and their rate adaptation algorithms work as a black box. We have developed a system for easy and rapid testing of media players under various network scenarios. In this paper, we introduce AdViSE, the Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players. The presented framework is used for the comparison and testing of media players in the context of adaptive video streaming over HTTP in web/HTML5 environments.

The demonstration showcases a series of experiments with different media players under given context conditions (e.g., network shaping, delivery format). We will also demonstrate the real-time capabilities of the framework and offline analysis including several QoE metrics with respect to a newly introduced bandwidth index.

ACM MMSys 2017http://mmsys17.iis.sinica.edu.tw/

Please feel free to contact me for details and/or I'd be happy to meet you at ACM MMSys'17.

QoMEX'17 paper accepted: "Towards Subjective Quality of Experience Assessment for Omnidirectional Video Streaming"

Towards Subjective Quality of Experience Assessment for Omnidirectional Video Streaming 

Raimund Schatz (AIT), Andreas Sackl (AIT), Christian Timmerer (AAU/Bitmovin) and Bruno Gardlo (AIT)

Abstract: Currently, we witness dramatically increasing interest in immersive media technologies like Virtual Reality (VR), particularly in omnidirectional video (OV) streaming. Omnidirectional (also called 360-degree) videos are panoramic spherical videos in which the user can look around during playback and which therefore can be understood as hybrids between traditional movie streaming and interactive VR worlds. Unfortunately, streaming this kind of content is extremely bandwidth intensive (compared to traditional 2D video) and therefore, Quality of Experience (QoE) tends to deteriorate significantly in absence of continuous optimal bandwidth conditions.

In this paper, we present a first approach towards subjective QoE assessment for omnidirectional video (OV) streaming. We present the results of a lab study on the QoE impact of stalling in the context of OV streaming using head-mounted displays (HMDs). Our findings show that subjective testing for immersive media like OV is not trivial, with even simple cases like stalling leading to unexpected results. After a discussion of characteristic pitfalls and lessons learned, we provide a a set of recommendations for upcoming OV assessment studies.


Please feel free to contact me for details and/or I'd be happy to meet you at QoMEX'17.

Wednesday, April 5, 2017

Adaptive Streaming of Traditional and Omnidirectional Media


This tutorial will be given at the following events:

Abstract

This tutorial consists of three main parts. In the first part, we provide a detailed overview of the HTML5 standard and show how it can be used for adaptive streaming deployments. In particular, we focus on the HTML5 video, media extensions, and multi-bitrate encoding, encapsulation and encryption workflows, and survey well-established streaming solutions. Furthermore, we present experiences from the existing deployments and the relevant de jure and de facto standards (DASH, HLS, CMAF) in this space. In the second part, we focus on omnidirectional (360°) media from creation to consumption. We survey means for the acquisition, projection, coding and packaging of omnidirectional media as well as delivery, decoding and rendering methods. Emerging standards and industry practices are covered as well. The last part presents some of the current research trends, open issues that need further exploration and investigation, and various efforts that are underway in the streaming industry.

Target Audience and Prerequisite Knowledge

This tutorial includes both introductory and advanced level information. The audience is expected of understanding of basic video coding and IP networking principles. Researchers, developers, content and service providers are all welcome.

Table of Contents

  • Part I: The HTML5 Standard and Adaptive Streaming
    • HTML5 video and media extensions
    • Survey of well-established streaming solutions
    • Multi-bitrate encoding, and encapsulation and encryption workflows
    • The MPEG-DASH standard, Apple HLS and the developing CMAF standard
  • Part II: Omnidirectional (360°) Media
    • Acquisition, projection, coding and packaging of 360° video
    • Delivery, decoding and rendering methods
    • The developing MPEG-OMAF and MPEG-I standards
  • Part III: Open Issues and Future Directions
    • Common issues in scaling and improving quality, multi-screen/hybrid delivery
    • Ongoing industry efforts

Speakers

Ali C. Begen recently joined the computer science department at Ozyegin University. Previously, he was a research and development engineer at Cisco, where he has architected, designed and developed algorithms, protocols, products and solutions in the service provider and enterprise video domains. Currently, in addition to teaching and research, he provides consulting services to industrial, legal, and academic institutions through Networked Media, a company he co-founded. Begen holds a Ph.D. degree in electrical and computer engineering from Georgia Tech. He received a number of scholarly and industry awards, and he has editorial positions in prestigious magazines and journals in the field. He is a senior member of the IEEE and a senior member of the ACM. In January 2016, he was elected as a distinguished lecturer by the IEEE Communications Society. Further information on his projects, publications, talks, and teaching, standards and professional activities can be found at http://ali.begen.net.

Christian Timmerer received his M.Sc. (Dipl.-Ing.) in January 2003 and his Ph.D. (Dr.techn.) in June 2006 (for research on the adaptation of scalable multimedia content in streaming and constrained environments) both from the Alpen-Adria-Universität (AAU) Klagenfurt. He joined the AAU in 1999 (as a system administrator) and is currently an Associate Professor at the Institute of Information Technology (ITEC) within the Multimedia Communication Group. His research interests include immersive multimedia communications, streaming, adaptation, Quality of Experience, and Sensory Experience. He was the general chair of WIAMIS 2008, QoMEX 2013, and MMSys 2016 and has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, COST IC1003 QUALINET, and ICoSOLE. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH where he also served as standard editor. In 2012 he cofounded Bitmovin (http://www.bitmovin.com/) to provide professional services around MPEG-DASH where he holds the position of the Chief Innovation Officer (CIO).

Saturday, April 1, 2017

VR/360 Streaming Standardization-Related Activities

Universal media access (UMA) as proposed in the late 90s, early 2000 is now reality. It is very easy to generate, distribute, share, and consume any media content, anywhere, anytime, and with/on any device. These kind of real-time entertainment services — specifically, streaming audio and video — are typically deployed over the open, unmanaged Internet and account now for more than 70% of the evening traffic in North American fixed access networks. It is assumed that this number will reach 80% by the end of 2020. A major technical breakthrough and enabler was certainly the adaptive streaming over HTTP resulting in the standardization of MPEG-DASH.

One of the next big things in adaptive media streaming is most likely related to virtual reality (VR) applications and, specifically, omnidirectional (360-degree) media streaming, which is currently built on top of the existing adaptive streaming ecosystems. The major interfaces of such an ecosystem are shown below and are described in a Bitmovin blog post some time ago (note: it has now evolved to Immersive Media referred to as MPEG-I).



Omnidirectional video (ODV) content allows the user to change her/his viewing direction in multiple directions while consuming the video, resulting in a more immersive experience than consuming traditional video content with a fixed viewing direction. Such video content can be consumed using different devices ranging from smart phones and desktop computers to special head-mounted displays (HMD) like Oculus Rift, Samsung Gear VR, HTC Vive, etc. When using a HMD to watch such a content, the viewing direction can be changed by head movements. On smart phones and tablets, the viewing direction can be changed by touch interaction or by moving the device around thanks to built-in sensors. On a desktop computer, the mouse or keyboard can be used for interacting with the omnidirectional video.

The streaming of ODV content is currently deployed in a naive way by simply streaming the entire 360-degree scene/view in constant quality without exploiting and optimizing the quality for the user’s viewport.

There are several standardization-related activities ongoing which I'd like to highlight in this blog post.
The VR industry forum has been established with the aim "to further the widespread availability of high quality audiovisual VR experiences, for the benefit of consumers" comprising working groups related to requirements, guidelines, interoperability, communications, and liaison. However, the VR-IF just started but it may be find itself in a similar role as the DASH-IF for DASH.

QUALINET is a European network concerned about Quality of Experience (QoE) in multimedia systems and services. In terms of VR/360 it runs a task force about "Immersive Media Experiences (IMEx)" where everyone is invited to contribute. QUALINET also coordinates standardization activities in this area. It can help organizing and conducting formal QoE assessments in various domains. For example, it has conducted various experiments during development of MPEG-H High Efficiency Video Coding (HEVC).

JPEG started an initiative called Pleno focusing on images. Additionally, the JPEG XS requirements document references VR applications and JPEG recently created an AhG on JPEG360 with the mandates to collect and define use cases for 360 degree image capture applications, develop requirements for such use cases, solicit industry engagement, collect evidence of existing solutions, and update description of needed metadata.

In terms of MPEG, I've previously reported about MPEG-I as part of my MPEG report (also see above) which currently includes five parts. The first part will be a technical report describing the scope of this new standard and a set of use cases and applications from which actual requirements can be derived. Technical reports are usually publicly available for free. The second part specifies the omnidirectional media application format (OMAF) addressing the urgent need of the industry for a standard is this area. Part three will address immersive video and part four defines immersive audio. Finally, part five will contain a specification for point cloud compression for which a call for proposals is currently available. OMAF is part of a first phase of standards related to immersive media and should finally become available by the end of 2017, beginning of 2018 while the other parts are scheduled at a later stage around 2020. The current OMAF committee draft comprises a specification of the i) equirectangular projection format (note that others might be added in the future), ii) metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data, iii) storage format adopting the ISO base media file format (ISOBMFF/mp4), and iv) the following codecs: MPEG-H High Efficiency Video Coding (HEVC) and MPEG-H 3D audio.

The Spatial Relationship Descriptor (SRD) of the MPEG-DASH standard provides means to describe how the media content is organized in the spatial domain. In particular, the SRD is fully integrated in the media presentation description (MPD) of MPEG-DASH and is used to describe a grid of rectangular tiles which allows a client implementation to request only a given region of interest — typically associated to a contiguous set of tiles. Interestingly, the SRD has been developed before OMAF and how SRD is used with OMAF is currently subject to standardization.

MPEG established an AhG related to Immersive Media Quality Evaluation with the goal to document requirements for VR QoE, collect test material, study existing methods for QoE assessment, and develop a test methodology -- very ambitious.

3GPP is working on a technical report on Virtual Reality (VR) media services over 3GPP which provides an introduction to VR, various use cases, media formats, interface aspects, and -- finally -- latency and synchronization aspects.

IEEE has started IEEE P2048 and here specifically "P2048.2 Standard for Virtual Reality and Augmented Reality: Immersive Video Taxonomy and Quality Metrics" -- to define different categories and levels of immersive video -- and "P2048.3 Standard for Virtual Reality and Augmented Reality: Immersive Video File and Stream Formats" -- to define formats of immersive video files and streams, and the functions and interactions enabled by the formats -- but not much material is available right now. However, P2048.2 seems to be related to QUALINET and P2048.3 could definitely benefit from what MPEG has done and is still doing (incl. also, e.g., MPEG-V). Additionally, there's IEEE P3333.3 defining a standard for HMD based 3D content motion sickness reducing technology to resolve VR sickness caused by the visual mechanism set by the HMD-based 3D content motion sickness through the study of i) visual response to the focal distortion, ii) visual response to the lens materials, iii) visual response to the lens refraction ratio, and iv) visual response to the frame rate.

The ITU-T started a new work program referred to as "G.QoE-VR” after successfully finalizing P.NATS which is now called P.1203. However, there are no details about "G.QoE-VR” publicly available yet, just found this here. In this context, it's worth to mention the Video Quality Experts Group (VQEG) which has a Immersive Media Group (IMG) with the mission on "quality assessment of immersive media, including virtual reality, augmented reality, stereoscopic 3DTV, multiview".

Finally, the Khronos group announced a VR standards initiative which resulted into OpenXR (Cross-Platform, Portable, Virtual Reality) defining an APIs for VR and AR applications. It again could benefit from MPEG standards in terms of codecs, file formats, and streaming formats. In this context, the WebVR already defines an API which provides support for accessing virtual reality devices, including sensors and head-mounted displays on the web.

DVB started a CM Study Mission Group on Virtual Reality which released an executive summary comprising mission statements of individuals/companies. The topic has been also discussed at DVB World. The goal of DVB CM-VR is to investigate the commercial case in the context of the DVB project which eventually may lead to technical specifications.

Most of these standards activities are currently in its infancy but definitely worth to follow. If you think I missed something, please let me know and I'm happy to include it / update this blog post.

Thursday, March 30, 2017

360-degree (live) adaptive streaming with RICOH THETA S and Bitmovin

Recently I got the RICOH THETA S 360-degree camera and I asked myself how to setup a (live) adaptive streaming session using the Bitmovin cloud encoding and HTML5 player. I quickly found some general guidelines on the internet but before providing step-by-step instructions one has to consider the following:
  • Update the firmware of your RICOH THETA S by downloading the basic app, start it (while the camera is connected via USB) and go to File -> Firmware Update... and follow the steps on the screen. It's pretty easy and mine got updated from v1.11 to v1.82.
  • Think about a storage solution for your files generated by the Bitmovin cloud encoding and possible options are FTP, Amazon S3, Google Cloud Storage, and Dropbox. I used Amazon S3 for this setup which provides a bucket name, "AWS Access Key", and "AWS Secret Key".
  • Setup a basic web site and make sure it works with the Bitmovin HTML5 player for video on demand services with the content hosted on the previously selected storage solution (i.e., avoid any CORS issues). In my setup I used Wordpress and the Bitmovin Wordpress plugin which makes it very easy...

Step 1: Follow steps 1-4 from here.

Follow steps 1-4 from the general guidelines. Basically, install the live-streaming app, register the device, and install/configure OBS. Enable the live streaming on the RICOH THETA S and within OBS use the "Custom Streaming Server" of the "Stream" settings. That basically connects the RICOH THETA S with OBS on your local computer. The next step is forwarding this stream to the Bitmovin cloud encoding service for DASH/HLS streaming.

Step 2: Create a new Bitmovin Output

  1. Login to the Bitmovin portal and go to Encoding -> Outputs -> Create Output
  2. Select Amazon S3 and use any “Output Profile name”, e.g., ricoh-livestream-test
  3. Enter the name of your Bucket from Amazon S3
  4. The prefix is not needed
  5. Select any “Host-Region” (preferably one close to where you are)
  6. Enter the ”AWS Access Key" and the “AWS Secret Key” from Amazon S3
  7. Make sure the "Create Public S3 URLs" checkbox is enabled
An example screenshot is shown below.

Finally, click the “+” sign to create it and if everything is correct, the output will be created, otherwise an error message will be shown. In such a case, make sure the bucket name and keys are correct as provided when creating a bucket on Amazon S3.

Step 3: Create a new Bitmovin Livestream

  1. Login to the Bitmovin portal and go to Live (beta) -> Create Livestream
  2. Select "Encoding-Profile": bitcodin fullHD is sufficient (4K not needed as the device provides only fullHD)
  3. Select "Output-Profile": select the output you’ve created in previous step (ricoh-livestream-test)
  4. Add a "Livestream-Name" (any string works here), e.g., ricoh-livestream-test
  5. Add a "Stream-Key" (any string works here), e.g., ricohlivestreamtest
  6. Click "Create Live Stream", an "Important Notice" shows up & click "Create Live Stream"
  7. Wait (could take some time, you may reload the page or go to the "Overview") for RTMP PUSH URL to be used in OBS
An example screenshot is shown below which displays the RTMP PUSH URL, Stream Key, MPD URL, and HLS URL to be used in the next steps.

The next step is to start streaming in OBS which provides the live stream from the RICOH THETA S to the Bitmovin cloud encoding.

Step 4: Start Streaming in OBS

  1. Go to OBS -> Settings
  2. In section "Stream", select "Custom Streaming Server"
  3. Enter the RTMP PUSH URL from Bitmovin in the "URL" field of OBS
  4. Enter the Stream Key from Bitmovin in the "Stream key" field of OBS
  5. Click "OK" and then click "Start Streaming" in OBS
An example screenshot is shown below and if everything works fine OBS will stream to the Bitmovin cloud encoding  service.
The final step is setting up the HTML5 player..

Step 5: Setup the HTML5 Player

Basically follow the instructions here or in my case I simply used Wordpress and the Bitmovin Wordpress plugin.
  1. Go to the Bitmovin WP plugin
  2. Select "Add New Video"
  3. Enter any name/title of the new video
  4. In the "Video" section, enter the "DASH URL" and "HLS URL" from the Bitmovin livestream provided in step 3 (i.e., the MPD URL and the HLS URL)
  5. In the "Player" section, select latest stable (in my case this was latest version 7)
  6. In the "VR" section, select startup mode "2d" and leave the rest as is
An example screenshot is shown below.
Finally, click on "Publish" in Wordpress which will give you a shortcut code to be placed (copy/paste) into your site or post and you're done...!

The setup during the live streaming session is shown in the screenshot below. The RICOH THETA S on the right is mounted on a tripod and connected via USB. My MacBook Pro runs OBS (see display on the right) which streams it to the Bitmovin cloud encoding and also shows the live streaming session within a browser (see display on the left) using the Bitmovin HTML5 player.



A similar approach can be used for video on demand content but in such a case you don't need OBS as you simply encode your content using the Bitmovin cloud encoding, transfer it to your web server, and use the HTML5 player for the actual streaming.

Friday, February 10, 2017

MPEG news: a report from the 117th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 117th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Committee Draft of the Omnidirectional Media Application Format (OMAF)
  • MPEG-H 3D Audio Verification Test Report
  • MPEG Workshop on 5-Year Roadmap Successfully Held in Geneva
  • Call for Proposals (CfP) for Point Cloud Compression (PCC)
  • Preliminary Call for Evidence on video compression with capability beyond HEVC
  • MPEG issues Committee Draft of the Media Orchestration (MORE) Standard
  • Technical Report on HDR/WCG Video Coding
In this blog post, I'd like to focus on the topics related to multimedia communication. Thus, let's start with OMAF.

Omnidirectional Media Application Format (OMAF)

Real-time entertainment services deployed over the open, unmanaged Internet – streaming audio and video – account now for more than 70% of the evening traffic in North American fixed access networks and it is assumed that this figure will reach 80 percent by 2020. More and more such bandwidth hungry applications and services are pushing onto the market including immersive media services such as virtual reality and, specifically 360-degree videos. However, the lack of appropriate standards and, consequently, reduced interoperability is becoming an issue. Thus, MPEG has started a project referred to as Omnidirectional Media Application Format (OMAF). The first milestone of this standard has been reached and the committee draft (CD) has been approved at the 117th MPEG meeting. Such application formats "are essentially superformats that combine selected technology components from MPEG (and other) standards to provide greater application interoperability, which helps satisfy users' growing need for better-integrated multimedia solutions" [MPEG-A]." In the context of OMAF, the following aspects are defined:
  • Equirectangular projection format (note: others might be added in the future)
  • Metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data
  • Storage format: ISO base media file format (ISOBMFF)
  • Codecs: High Efficiency Video Coding (HEVC) and MPEG-H 3D audio
OMAF is the first specification which is defined as part of a bigger project currently referred to as ISO/IEC 23090 -- Immersive Media (Coded Representation of Immersive Media). It currently has the acronym MPEG-I and we have previously used MPEG-VR which is now replaced by MPEG-I (that still might chance in the future). It is expected that the standard will become Final Draft International Standard (FDIS) by Q4 of 2017. Interestingly, it does not include AVC and AAC, probably the most obvious candidates for video and audio codecs which have been massively deployed in the last decade and probably still will be a major dominator (and also denominator) in upcoming years. On the other hand, the equirectangular projection format is currently the only one defined as it is broadly used already in off-the-shelf hardware/software solutions for the creation of omnidirectional/360-degree videos. Finally, the metadata formats enabling the rendering of 360-degree monoscopic and stereoscopic video is highly appreciated. A solution for MPEG-DASH based on AVC/AAC utilizing equirectangular projection format for both monoscopic and stereoscopic video is shown as part of Bitmovin's solution for VR and 360-degree video.

Research aspects related to OMAF can be summarized as follows:
  • HEVC supports tiles which allow for efficient streaming of omnidirectional video but HEVC is not as widely deployed as AVC. Thus, it would be interesting how to mimic such a tile-based streaming approach utilizing AVC.
  • The question how to efficiently encode and package HEVC tile-based video is an open issue and call for a tradeoff between tile flexibility and coding efficiency.
  • When combined with MPEG-DASH (or similar), there's a need to update the adaptation logic as the with tiles yet another dimension is added that needs to be considered in order to provide a good Quality of Experience (QoE).
  • QoE is a big issue here and not well covered in the literature. Various aspects are worth to be investigated including a comprehensive dataset to enable reproducibility of research results in this domain. Finally, as omnidirectional video allows for interactivity, also the user experience is becoming an issue which needs to be covered within the research community.
A second topic I'd like to highlight in this blog post is related to the preliminary call for evidence on video compression with capability beyond HEVC.

Preliminary Call for Evidence on video compression with capability beyond HEVC

A call for evidence is issued to see whether sufficient technological potential exists to start a more rigid phase of standardization. Currently, MPEG together with VCEG have developed a Joint Exploration Model (JEM) algorithm that is already known to provide bit rate reductions in the range of 20-30% for relevant test cases, as well as subjective quality benefits. The goal of this new standard -- with a preliminary target date for completion around late 2020 -- is to develop technology providing better compression capability than the existing standard, not only for conventional video material but also for other domains such as HDR/WCG or VR/360-degrees video. An important aspect in this area is certainly over-the-top video delivery (like with MPEG-DASH) which includes features such as scalability and Quality of Experience (QoE). Scalable video coding has been added to video coding standards since MPEG-2 but never reached wide-spread adoption. That might change in case it becomes a prime-time feature of a new video codec as scalable video coding clearly shows benefits when doing dynamic adaptive streaming over HTTP. QoE did find its way already into video coding, at least when it comes to evaluating the results where subjective tests are now an integral part of every new video codec developed by MPEG (in addition to usual PSNR measurements). Therefore, the most interesting research topics from a multimedia communication point of view would be to optimize the DASH-like delivery of such new codecs with respect to scalability and QoE. Note that if you don't like scalable video coding, feel free to propose something else as long as it reduces storage and networking costs significantly.

MPEG Workshop “Global Media Technology Standards for an Immersive Age”

On January 18, 2017 MPEG successfully held a public workshop on “Global Media Technology Standards for an Immersive Age” hosting a series of keynotes from Bitmovin, DVB, Orange, Sky Italia, and Technicolor. Stefan Lederer, CEO of Bitmovin discussed today's and future challenges with new forms of content like 360°, AR and VR. All slides are available here and MPEG took their feedback into consideration in an update of its 5-year standardization roadmap. David Wood (EBU) reported on the DVB VR study mission and Ralf Schaefer (Technicolor) presented a snapshot on VR services. Gilles Teniou (Orange) discussed video formats for VR pointing out a new opportunity to increase the content value but also raising a question what is missing today. Finally, Massimo Bertolotti (Sky Italia) introduced his view on the immersive media experience age.

Overall, the workshop was well attended and as mentioned above, MPEG is currently working on a new standards project related to immersive media. Currently, this project comprises five parts. The first part comprises a technical report describing the scope (incl. kind of system architecture), use cases, and applications. The second part is OMAF (see above) and the third/forth parts are related to immersive video and audio respectively. Part five is about point cloud compression.

For those interested, please check out the slides from industry representatives in this field and draw your own conclusions what could be interesting for your own research. I'm happy to see any reactions, hints, etc. in the comments..

Finally, let's have a look what happened related to MPEG-DASH, a topic with a long history on this blog.

MPEG-DASH and CMAF: Friend or Foe?

For MPEG-DASH and CMAF it was a meeting "in between" official standardization stages. MPEG-DASH experts are still working on the third edition which will be a consolidated version of the 2nd edition and various amendments and corrigenda. In the meantime, MPEG issues a white paper on the new features of MPEG-DASH which I would like to highlight here.
  • Spatial Relationship Description (SRD): allows to describe tiles and region of interests for partial delivery of media presentations. This is highly related to OMAF and VR/360-degree video streaming.
  • External MPD linking: this feature allows to describe the relationship between a single program/channel and a preview mosaic channel having all channels at once within the MPD.
  • Period continuity: simple signaling mechanism to indicate whether one period is a continuation of the previous one which is relevant for ad-insertion or live programs.
  • MPD chaining: allows for chaining two or more MPDs to each other, e.g., pre-roll ad when joining a live program.
  • Flexible segment format for broadcast TV: separates the signaling of the switching points and random access points in each stream and, thus, the content can be encoded with a good compression efficiency, yet allowing higher number of random access point, but with lower frequency of switching points.
  • Server and network-assisted DASH (SAND): enables asynchronous network-to-client and network-to-network communication of quality-related assisting information.
  • DASH with server push and WebSockets: basically addresses issues related to HTTP/2 push feature and WebSocket.
CMAF issued a study document which captures the current progress and all national bodies are encouraged to take this into account when commenting on the Committee Draft (CD). To answer the question in the headline above, it looks more and more like as DASH and CMAF will become friends -- let's hope that the friendship lasts for a long time.

What else happened at the MPEG meeting?

  • Committee Draft MORE (note: type in 'man more' on any unix/linux/max terminal and you'll get 'less - opposite of more';): MORE stands for “Media Orchestration” and provides a specification that enables the automated combination of multiple media sources (cameras, microphones) into a coherent multimedia experience. Additionally, it targets use cases where a multimedia experience is rendered on multiple devices simultaneously, again giving a consistent and coherent experience.
  • Technical Report on HDR/WCG Video Coding: This technical report comprises conversion and coding practices for High Dynamic Range (HDR) and Wide Colour Gamut (WCG) video coding (ISO/IEC 23008-14). The purpose of this document is to provide a set of publicly referenceable recommended guidelines for the operation of AVC or HEVC systems adapted for compressing HDR/WCG video for consumer distribution applications
  • CfP Point Cloud Compression (PCC): This call solicits technologies for the coding of 3D point clouds with associated attributes such as color and material properties. It will be part of the immersive media project introduced above.
  • MPEG-H 3D Audio verification test report: This report presents results of four subjective listening tests that assessed the performance of the Low Complexity Profile of MPEG-H 3D Audio. The tests covered a range of bit rates and a range of “immersive audio” use cases (i.e., from 22.2 down to 2.0 channel presentations). Seven test sites participated in the tests with a total of 288 listeners.
The next MPEG meeting will be held in Hobart, April 3-7, 2017. Feel free to contact us for any questions or comments.