Tuesday, July 19, 2016

MPEG news: a report from the 115th meeting, Geneva, Switzerland


The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 115th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Genomic Information Compression and Storage joint Call for Proposals in conjunction with ISO/TC 276/WG 5
  • Plug-in free decoding of 3D objects within Web browsers
  • MPEG-H 3D Audio AMD 3 reaches FDAM status
  • Common Media Application Format for Dynamic Adaptive Streaming Applications
  • 4th edition of AVC/HEVC file format
In this blog post, however, I will cover topics specifically relevant for adaptive media streaming, namely:
  • Recent developments in MPEG-DASH
  • Common media application format (CMAF)
  • MPEG-VR (virtual reality)
  • The MPEG roadmap/vision for the future.

MPEG-DASH Server and Network assisted DASH (SAND): ISO/IEC 23009-5

Part 5 of MPEG-DASH, referred to as SAND – server and network-assisted DASH – has reached FDIS. This work item started sometime ago at a public MPEG workshop during the 105th MPEG meeting in Vienna. The goal of this part of MPEG-DASH is to enhance the delivery of DASH content by introducing messages between DASH clients and network elements or between various network elements for the purpose of improving the efficiency of streaming sessions by providing information about real-time operational characteristics of networks, servers, proxies, caches, CDNs as well as DASH client’s performance and status. In particular, it defines the following:
  1. The SAND architecture which identifies the SAND network elements and the nature of SAND messages exchanged among them.
  2. The semantics of SAND messages exchanged between the network elements present in the SAND architecture.
  3. An encoding scheme for the SAND messages.
  4. The minimum to implement a SAND message delivery protocol.
The way that this information is to be utilized is deliberately not defined within the standard and left open for (industry) competition (or other standards developing organizations). In any case, there’s plenty of room for research activities around the topic of SAND, specifically:
  • A main issue is the evaluation of MPEG-DASH SAND in terms of qualitative and quantitative improvements with respect to QoS/QoE. Some papers are available already and have been published within ACM MMSys 2016.
  • Another topic of interest includes an analysis regarding scalability and possible overhead; in other words, I'm wondering whether it's worth using SAND to improve DASH.

MPEG-DASH with Server Push and WebSockets: ISO/IEC 23009-6

Part 6 of MPEG-DASH reached DIS stage and deals with server push and Web sockets, i.e., it specifies the carriage of MPEG-DASH media presentations over full duplex HTTP-compatible protocols, particularly HTTP/2 and WebSocket. The specification comes with a set of generic definitions for which bindings are defined allowing its usage in various formats. Currently, the specification supports HTTP/2 and WebSocket.

For the former it is required to define the push policy as an HTTP header extension whereas the latter requires the definition of a DASH subprotocol. Luckily, these are the preferred extension mechanisms for both HTTP/2 and WebSocket and, thus, interoperability is provided. The question of whether or not the industry will adopt these extensions cannot be answered right now but I would recommend keeping an eye on this and there are certainly multiple research topics worth exploring in the future.

An interesting aspect for the research community would be to quantify the utility of using push methods within dynamic adaptive environments in terms of QoE and start-up delay. Some papers provide preliminary answers but a comprehensive evaluation is missing.

To conclude the recent MPEG-DASH developments, the DASH-IF recently established the Excellence in DASH Award at ACM MMSys’16 and the winners are presented here (including some of the recent developments described in this blog post).

Common Media Application Format (CMAF): ISO/IEC 23000-19

The goal of CMAF is to enable application consortia to reference a single MPEG specification (i.e., a “common media format”) that would allow a single media encoding to use across many applications and devices. Therefore, CMAF defines the encoding and packaging of segmented media objects for delivery and decoding on end user devices in adaptive multimedia presentations. This sounds very familiar and reminds us a bit on what the DASH-IF is doing with their interoperability points. One of the goals of CMAF is to integrate HLS in MPEG-DASH which is backed up with this WWDC video where Apple announces the support of fragmented MP4 in HLS. The streaming of this announcement is only available in Safari and through the WWDC app but Bitmovin has shown that it also works on Mac iOS 10 and above, and for PC users all recent browser versions including Edge, FireFox, Chrome, and (of course) Safari.

MPEG Virtual Reality

Virtual reality is becoming a hot topic across the industry (and also academia) which also reaches standards developing organizations like MPEG. Therefore, MPEG established an ad-hoc group (with an email reflector) to develop a roadmap required for MPEG-VR. Others have also started working on this like DVB, DASH-IF, and QUALINET (and maybe many others: W3C, 3GPP). In any case, it shows that there’s a massive interest in this topic and Bitmovin has shown already what can be done in this area within today’s Web environments. Obviously, adaptive streaming is an important aspect for VR applications including a many research questions to be addressed in the (near) future. A first step towards a concrete solution is the Omnidirectional Media Application Format (OMAF) which is currently at working draft stage (details to be provided in a future blog post).

The research aspects covers a wide range activity including - but not limited to - content capturing, content representation, streaming/network optimization, consumption, and QoE.

MPEG roadmap/vision

At it’s 115th meeting, MPEG published a document that lays out its medium-term strategic standardization roadmap. The goal of this document is collecting feedback from anyone in professional and B2B industries dealing with media, specifically but not limited to broadcasting, content and service provision, media equipment manufacturing, and telecommunication industry. The roadmap is depicted below and further described in the document available here. Please note that “360 AV” in the figure below also refers to VR but unfortunately it’s not (yet) reflected in the figure. However, it points out the aspects to be addressed by MPEG in the future which would be relevant for both industry and academia.


The next MPEG meeting will be held in Chengdu, October 17-21, 2016.

Wednesday, December 9, 2015

Real-Time Entertainment now accounts for >70% of the Internet Traffic

Sandvine's Global Internet Phenomena Report (December 2015 edition) reveals that real-time entertainment (i.e., streaming video and audio) traffic now accounts for more than 70% of North American downstream traffic in the peak evening hours on fixed access networks (see Figure 1). Interestingly, five years ago it accounted only for less than 35%.

Netflix is mainly responsible for this with a share of >37% (i.e., more than the total five years ago) but already had a big share in 2011 (~32%) and didn't "improve" that much. Second biggest share is coming from YouTube with roughly 18%.

I'm using these figures within my slides to motivate that streaming video and audio is a huge market - opening a lot of opportunities for research and innovation - and it's interesting to see how the Internet is being used. In most of these cases, the Internet is used as is, without any bandwidth guarantees and clients adapt themselves to what's available in terms of bandwidth. Service providers offer the content in multiple versions (e.g., different bitrates, resolution, etc.) and each version is segmented to which clients can adapt both at the beginning and also during the session. This principle is known as over-the-top adaptive video streaming and a standardized representation format is available known as Dynamic Adaptive Streaming over HTTP (DASH) under ISO/IEC 23009. Note that the adaptation logic is not part of the standard and open a punch of possibilities in terms of research and engineering.

Both Netflix and YouTube adopted the DASH format which is now natively supported by modern Web browsers thanks to the HTML5 Media Source Extensions (MSE) and even digital rights management is possible due to Encrypted Media Extensions (EME). All one needs is a client implementation that is compliant to the standard - the easy part; the standard is freely available - and adapts to the dynamically changing usage context while maximizing the Quality of Experience (QoE) - the difficult part. That's why we at bitmovin thought to setup a grand challenge at IEEE ICME 2016 in Seattle, USA with the aim to solicit contributions addressing end-to-end delivery aspects which improve the QoE while optimally utilising the available network infrastructures and its associated costs. This includes the content preparation for DASH, the content delivery within existing networks, and the client implementations. Please feel free to contribute to this exciting problem and if you have further questions or comments, please contact us here.

Thursday, November 12, 2015

Final Call for Papers: ACM MMSys 2016 Full Papers


The autumn shows itself from its best side here in Klagenfurt and this is the final call for papers for ACM MMSys 2016 full papers with YouTube as gold sponsor and featuring the Excellence in DASH Award sponsored by the DASH-IF.

ACM MMSys 2016
May 10-13, 2016
Klagenfurt am Wörthersee, Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. While research about specific aspects of multimedia systems are regularly published in the various proceedings and transactions of the networking, operating system, realtime system, and database communities, MMSys aims to cut across these domains in the context of multimedia data types. This provides a unique opportunity to view the intersections and the inter-play of the various approaches and solutions developed across these domains to deal with multimedia data types.

MMSys is a venue for researchers who explore:
  • Complete multimedia systems that provide a new kind of multimedia experience or systems whose overall performance improves the state-of-the-art through new research results in one of more components, or
  • Enhancements to one or more system components that provide a documented improvement over the state-of-the-art for handling continuous media or time-dependent services.
Such individual system components include:
  • Operating systems
  • Distributed architectures and protocol enhancements
  • Domain languages, development tools and abstraction layers
  • Using new architectures or computing resources for multimedia
  • New or improved I/O architectures or I/O devices, innovative uses and algorithms for their operation
  • Representation of continuous or time-dependent media
  • Metrics, measures and measurement tools to assess performance
This touches aspects of many hot topics including but not limited to: adaptive streaming, games, virtual environments, augmented reality, 3D video, Ultra-HD, HDR, immersive systems, plenoptics, 360° video, multimedia IoT, multi- and many-core, GPGPUs, mobile streaming, P2P, clouds, cyber-physical systems.

Submission Guidelines
Papers should be between 6 and 12 pages long (in PDF format) prepared in the ACM style and written in English. The submission site is open and papers can be submitted using the following URL: http://mmsys2016.itec.aau.at/online-paper-submission/

Important dates:
  • Submission Deadline: November 27, 2015 December 11, 2015
  • Reviews available to Authors: January 15, 2016
  • Rebuttal Deadline: January 22, 2016
  • Acceptance Notification: January 29, 2016
  • Camera-ready Deadline: March 11, 2016
DASH Industry Forum Excellence in DASH Award
This award offers a financial prize for those papers which best meet the following requirements:
  1. Paper must substantially address MPEG-DASH as the presentation format
  2. Paper must be selected for presentation at ACM MMSys 2016
  3. Preference given to practical enhancements and developments which can sustain future commercial usefulness of DASH
  4. DASH format used should conform to the DASH IF Interoperability Points as defined by http://dashif.org/guidelines/
Further details about the Excellence in DASH Award can be found here.



Friday, September 4, 2015

HEVC, AOMedia, MPEG, and DASH

Ultra-high definition (UHD) displays are available for quite some time and in terms of video coding the MPEG-HEVC/H.265 standard was designed to support these high resolutions in an efficient way. And it does, with a performance gain of more than twice as much as its predecessor MPEG-AVC/H.264. But it all comes with costs - not only in terms of coding complexity at both encoder and decoder - especially when it comes to licensing. The MPEG-AVC/H.264 licenses are managed by MPEG LA but for HEVC/H.265 there are two patent pools available which makes its industry adoption more difficult than it was for AVC.

HEVC was published by ISO in early 2015 and in the meantime MPEG started discussing about future video coding using its usual approach of open workshops inviting experts from companies inside and outside of MPEG. However, now there’s the Alliance for Open Media (AOMedia) promising to provide "open, royalty-free and interoperable solutions for the next generation of video delivery” (press release). A good overview and summary is available here which even mentions that a third HEVC patent pool is shaping up (OMG!).

Anyway, even if AOMedia’s "media codecs, media formats, and related technologies” are free like in “free beer” it’s still not clear whether it will taste anything good. Also, many big players are not part of this alliance and could (easily) come up with some patent claims at a later stage jeopardising the whole process (cf. what happened with VP9). In any case, AOMedia is certainly disruptive and together with other disruptive media technologies (e.g., PERSEUS although I have some doubts here) might change the media coding landscape, not clear whether it will be a turn to the better though...

Finally, I was wondering how does this all impact DASH, specifically as MPEG LA recently announced that they want to establish a patent pool for DASH although major players have stated some time ago not to charge anything for DASH (wrt licensing). In terms of media codecs please note that DASH is codec agnostic and it can work with any codec, also those not specified within MPEG and we’ve shown it works some time ago already (using WebM). The main problem is, however, which codecs are supported on which end user devices and how to access them with which API (like HMTL5 & MSE). For example, some Android devices support HEVC but not through HMTL5 & MSE which makes it more difficult to integrate with DASH.

Using MPEG-DASH with HMTL5 & MSE is currently the preferred way how to deploy DASH, even the DASH-IF’s reference player (dash.js) is assuming HTML5 & MSE and companies like bitmovin are offering bitdash following the same principles. Integrating new codecs on the DASH encoding side like on bitmovin’s bitcodin cloud-based transcoding-as-a-service isn’t a big deal and can be done very quickly as soon as software implementations are available. Thus, the problem is more on the plethora of heterogeneous end user devices like smart phones, tablets, laptops, computers, set-top-boxes, TV sets, media gateways, gaming consoles, etc. and their variety of platforms and operating systems.

Therefore, I’m wondering whether AOMedia (or whatever will come in the future) is a real effort changing the media landscape to the better or just another competing standard to choose from … but on the other side, as Andrew S. Tanenbaum has written already in his book on computer networks, “the nice thing about standards is that you have so many to choose from.”

Monday, August 31, 2015

Over-the-Top Content Delivery: State of the Art and Challenges Ahead at ICME 2015

As stated in my MPEG report from Warsaw I attended ICME'15 in Torino to give a tutorial -- together with Ali Begen -- about over-the-top content delivery. The slides are available as usual and embedded here...


If you have any questions or comments, please let us know. The goal of this tutorial is to give an overview about MPEG-DASH and also selected informative aspects (e.g., workflows, adaptation, quality, evaluation) not covered in the standard. However, it should not be seen as a tutorial on the standard as many approaches presented here can be also applied on other formats although MPEG-DASH seems to be the most promising from those available. During the tutorial we ran into interesting questions and discussions with the audience and I could also show some live demos from bitmovin using bitcodin and bitdash. Attendees were impressed about the maturity of the technology behind MPEG-DASH and how research results find their way into actual products available on the market.

If you're interested now, I'll give a similar tutorial -- with Tobias Hoßfeld -- about "Adaptive Media Streaming and Quality of Experience Evaluations using Crowdsourcing" during ITC27 (Sep 7, 2015, Ghent, Belgium) and bitmovin will be at IBC2015 in Amsterdam.


Friday, August 28, 2015

One Year of MPEG

In my last MPEG report (index) I’ve mentioned that the 112th MPEG meeting in Warsaw was my 50th MPEG meeting which roughly accumulates to one year of MPEG meetings. That is, one year of my life I've spend in MPEG meetings - scary, isn't it? Thus, I thought it’s time to recap what I have done in MPEG so far featuring the following topics/standards where I had significant contributions:
  • MPEG-21 - The Multimedia Framework 
  • MPEG-M - MPEG extensible middleware (MXM), later renamed to multimedia service platform technologies 
  • MPEG-V - Information exchange with Virtual Worlds, later renamed to media context and control
  • MPEG-DASH - Dynamic Adaptive Streaming over HTTP

MPEG-21 - The Multimedia Framework

I started my work with standards, specifically MPEG, with Part 7 of MPEG-21 referred to as Digital Item Adaptation (DIA) and developed the generic Bitstream Syntax Description (gBSD) in collaboration with SIEMENS which allows for a coding-format independent (generic) adaptation of scalable multimedia content towards the actual usage environment (e.g., different devices, resolution, bitrate). The main goal of DIA was to enable the Universal Media Access (UMA) -- any content, anytime, anywhere on any device -- and also motivated me to start this blog. I also wrote a series of blog entries on this topic: O Universal Multimedia Access, Where Art Thou? which gives an overview about this topic and basically is also what I’ve done in my Ph.D. thesis. Later I helped a lot in various MPEG-21 parts including its dissemination and documented where it has been used. In the past, I saw many forms of Digital Items (e.g., iTunesLP was one of the first) but unfortunately the need for a standardised format is very low. Instead, proprietary formats are used and I realised that developers are more into APIs than formats. The format comes with the API but it’s the availability of an API that attracts developers and makes them to adopt a certain technology. 

MPEG-M

The lessons learned from MPEG-21 was one reason why I joined the MPEG-M project as it was exactly the purpose to create an API into various MPEG technologies, providing developers a tool that makes it easy for them to adopt new technologies and, thus, new formats/standards. We created an entire architecture, APIs, and reference software to make it easy for external people to adopt MPEG technologies. The goal was to hide the complexity of the technology through simple to use APIs which should enable the accelerated development of components, solutions, and applications utilising digital media content. A good overview about MPEG-M can found on this poster.

MPEG-V

When MPEG started working on MPEG-V (it was not called like that in the beginning), I saw it as an extension of UMA and MPEG-21 DIA to go beyond audio-visual experiences by stimulating potentially all human senses. We created and standardised an XML-based language that enables the annotation of multimedia content with sensory effects. Later the scope was extended to include virtual worlds which resulted in the acronym MPEG-V. It also brought me to start working on Quality of Experience (QoE) and we coined the term Quality of Sensory Experience (QuASE) as part of the (virtual) SELab at Alpen-Adria-Universität Klagenfurt which offers a rich set of open-source software tools and datasets around this topic on top of off-the-shelf hardware (still in use in my office).

MPEG-DASH

The latest project I’m working on is MPEG-DASH where I’ve also co-founded bitmovin, now a successful startup offering fastest transcoding in the cloud (bitcodin) and high quality MPEG-DASH players (bitdash). It all started when MPEG asked me to chair the evaluation of call for proposals on HTTP streaming of MPEG media. We then created dash.itec.aau.at that offers a huge set of open source tools and datasets used by both academia and industry worldwide (e.g., listed on DASH-IF). I think I can proudly state that this is the most successful MPEG activity I've been involved so far... (note: a live deployment can be found here which shows 24/7 music videos over the Internet using bitcodin and bitdash).

DASH and QuASE are also part of my habilitation which brought me into the current position at Alpen-Adria-Universität Klagenfurt as Associate Professor. Finally, one might ask the question, was it all worth spending so much time for MPEG and at MPEG meetings. I would say YES and there are many reasons which could easily results in another blog post (or more) but it’s better to discuss this face to face, I'm sure there will be plenty of possibilities in the (near) future or you come to Klagenfurt, e.g., for ACM MMSys 2016 ...




Tuesday, July 28, 2015

MPEG news: a report from the 112th meeting, Warsaw, Poland



This blog post is also available at at bitmovin tech blog and SIGMM records.

The 112th MPEG meeting in Warsaw, Poland was a special meeting for me. It was my 50th MPEG meeting which roughly accumulates to one year of MPEG meetings (i.e., one year of my life I've spend in MPEG meetings incl. traveling - scary, isn't it? ... more on this in another blog post). But what happened at this 112th MPEG meeting (my 50th meeting)...

  • Requirements: CDVA, Future of Video Coding Standardization (no acronym yet), Genome compression
  • Systems: M2TS (ISO/IEC 13818-1:2015), DASH 3rd edition, Media Orchestration (no acronym yet), TRUFFLE
  • Video/JCT-VC/JCT-3D: MPEG-4 AVC, Future Video Coding, HDR, SCC
  • Audio: 3D audio
  • 3DG: PCC, MIoT, Wearable
MPEG Friday Plenary. Photo (c) Christian Timmerer.
As usual, the official press release and other publicly available documents can be found here. Let's dig into the different subgroups:
Requirements

In requirements experts were working on the Call for Proposals (CfP) for Compact Descriptors for Video Analysis (CDVA) including an evaluation framework. The evaluation framework includes 800-1000 objects (large objects like building facades, landmarks, etc.; small(er) objects like paintings, books, statues, etc.; scenes like interior scenes, natural scenes, multi-camera shots) and the evaluation of the responses should be conducted for the 114th meeting in San Diego.

The future of video coding standardization is currently happening in MPEG and shaping the way for the successor of of the HEVC standard. The current goal is providing (native) support for scalability (more than two spatial resolutions) and 30% compression gain for some applications (requiring a limited increase in decoder complexity) but actually preferred is 50% compression gain (at a significant increase of the encoder complexity). MPEG will hold a workshop at the next meeting in Geneva discussing specific compression techniques, objective (HDR) video quality metrics, and compression technologies for specific applications (e.g., multiple-stream representations, energy-saving encoders/decoders, games, drones). The current goal is having the International Standard for this new video coding standard around 2020.

MPEG has recently started a new project referred to as Genome Compression which is about of course about the compression of genome information. A big dataset has been collected and experts working on the Call for Evidence (CfE). The plan is holding a workshop at the next MPEG meeting in Geneva regarding prospect of Genome Compression and Storage Standardization targeting users, manufactures, service providers, technologists, etc.

Systems


Summer in Warsaw. Photo (c) Christian Timmerer.
The 5th edition of the MPEG-2 Systems standard has been published as ISO/IEC 13818-1:2015 on the 1st of July 2015 and is a consolidation of the 4th edition + Amendments 1-5.

In terms of MPEG-DASH, the draft text of ISO/IEC 23009-1 3rd edition comprising 2nd edition + COR 1 + AMD 1 + AMD 2 + AMD 3 + COR 2 is available for committee internal review. The expected publication date is scheduled for, most likely, 2016. Currently, MPEG-DASH includes a lot of activity in the following areas: spatial relationship description, generalized URL parameters, authentication, access control, multiple MPDs, full duplex protocols (aka HTTP/2 etc.), advanced and generalized HTTP feedback information, and various core experiments:
  • SAND (Sever and Network Assisted DASH)
  • FDH (Full Duplex DASH)
  • SAP-Independent Segment Signaling (SISSI)
  • URI Signing for DASH
  • Content Aggregation and Playback COntrol (CAPCO)
In particular, the core experiment process is very open as most work is conducted during the Ad hoc Group (AhG) period which is discussed on the publicly available MPEG-DASH reflector.

MPEG systems recently started an activity that is related to media orchestration which applies to capture as well as consumption and concerns scenarios with multiple sensors as well as multiple rendering devices, including one-to-many and many-to-one scenarios resulting in a worthwhile, customized experience.

Finally, the systems subgroup started an exploration activity regarding real-time streaming of file (a.k.a TRUFFLE) which should perform an gap analysis leading to extensions of the MPEG Media Transport (MMT) standard. However, some experts within MPEG concluded that most/all use cases identified within this activity could be actually solved with existing technology such as DASH. Thus, this activity may still need some discussions...

Video/JCT-VC/JCT-3D

The MPEG video subgroup is working towards a new amendment for the MPEG-4 AVC standard covering resolutions up to 8K and higher frame rates for lower resolution. Interestingly, although MPEG most of the time is ahead of industry, 8K and high frame rate is already supported in browser environments (e.g., using bitdash 8K, HFR) and modern encoding platforms like bitcodin. However, it's good that we finally have means for an interoperable signaling of this profile.

In terms of future video coding standardization, the video subgroup released a call for test material. Two sets of test sequences are already available and will be investigated regarding compression until next meeting.

After a successful call for evidence for High Dynamic Range (HDR), the technical work starts in the video subgroup with the goal to develop an architecture ("H2M") as well as three core experiments (optimization without HEVC specification change, alternative reconstruction approaches, objective metrics).

The main topic of the JCT-VC was screen content coding (SCC) which came up with new coding tools that are better compressing content that is (fully or partially) computer generated leading to a significant improvement of compression, approx. or larger than 50% rate reduction for specific screen content.

Audio

The audio subgroup is mainly concentrating on 3D audio where they identified the need for intermediate bitrates between 3D audio phase 1 and 2. Currently, phase 1 identified 256, 512, 1200 kb/s whereas phase 2 focuses on 128, 96, 64, 48 kb/s. The broadcasting industry needs intermediate bitrates and, thus, phase 2 is extended to bitrates between 128 and 256 kb/s.

3DG

MPEG 3DG is working on point cloud compression (PCC) for which open source software has been identified. Additionally, there're new activity in the area of Media Internet of Things (MIoT) and wearable computing (like glasses and watches) that could lead to new standards developed within MPEG. Therefore, stay tuned on these topics as they may shape your future.

The week after the MPEG meeting I met the MPEG convenor and the JPEG convenor again during ICME2015 in Torino but that's another story...
L. Chiariglione, H. Hellwagner, T. Ebrahimi, C. Timmerer (from left to right) during ICME2015. Photo (c) T. Ebrahimi.