Tuesday, June 6, 2017

QoMEX'17 Review: Down the Rabbit Hole - Immersive Experience

During QoMEX 2017 in Erfurt, Germany we had a special session entitled "Down the Rabbit Hole" which I have introduced here already. The papers of the special session will appear soon in IEEEXplore but together with my co-organizers of this special session -- Raimund Schatz and Judith Redi -- we also wanted to run the special session in a special way. Therefore, we asked authors to prepare concise and thought-provoking paper presentations (~15min incl. Q&A -- paper title, presenter, picture, key words below) to save some time for a panel discussion. Surprisingly, it worked very well and the special session turned out to be worthwhile and informative. In order to keep the audience connected and involved we posted a single slide of all panelists (i.e., paper presenters) which was shown all the time (see below).

The discussion was centered around the question "what is your understanding of a fully immersive experience" which revealed interesting aspects and finally resulted in the main challenge how to quantify immersive experience. In this context, Mr. T. (only those who've been at QoMEX and in this session know why he is called Mr. T. -- join us next time and we will explain you what's behind) raised an interesting idea to interpret the Turing test for immersive experience. That is, fully or truly immersive experience is achieved if a human is no longer aware that she/he actually interacts with cyber-physical systems. I think this statement sets the bar (high) but definitely worth to consider.

Finally, I'd like to thank all presenters/panelists for an amazing special session at QoMEX'17 but the journey is not yet over. I'll be attending ACM MMSys and IEEE ICME presenting/discussing various aspects of immersive experiences; also at the MPEG meeting in Torino which will be dedicated to standardization aspects of immersive experiences.

Also big big thanks to the conference organizers, the team around the general chair Alexander Raake
(TU Ilmenau, Germany), for hosting such a wonderful event! Hope seeing you all next year for QoMEX 2018.

Feel free to test/play around with Bitmovin solutions for VR/360-degree streaming and if you have a RICOH THETA S check out my blog post how to setup a live streaming session.

Come and join us on the journey down the rabbit hole which eventually will lead to wonderland.

Tuesday, May 30, 2017

MPEG news: a report from the 118th meeting, Hobart, Australia

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.

MPEG News Archive
The entire MPEG press release can be found here comprising the following topics:
  • Coded Representation of Immersive Media (MPEG-I): new work item approved and call for test data issued
  • Common Media Application Format (CMAF): FDIS approved
  • Beyond High Efficiency Video Coding (HEVC): call for evidence for "beyond HEVC" and verification tests for screen content coding extensions of HEVC

Coded Representation of Immersive Media (MPEG-I)

MPEG started to work on the new work item referred to as ISO/IEC 23090 with the "nickname" MPEG-I targeting future immersive applications. The goal of this new standard is to enable various forms of audio-visual immersion including panoramic video with 2D and 3D audio with various degrees of true 3D visual perception. It currently comprises five parts: (pt. 1) a technical report describing the scope of this new standard and a set of use cases and applications; (pt. 2) an application format for omnidirectional media (aka OMAF) to address the urgent need of the industry for a standard is this area; (pt. 3) immersive video which is a kind of placeholder for the successor of HEVC (if at all); (pt. 4) immersive audio as a placeholder for the successor of 3D audio (if at all); and (pt. 5) for point cloud compression. The point cloud compression standard targets lossy compression for point clouds in real-time communication, six Degrees of Freedom (6 DoF) virtual reality, and the dynamic mapping for autonomous driving, cultural heritage applications, etc. Part 2 is related to OMAF which I've discussed in my previous blog post.

MPEG also established an Ad-hoc Group (AhG) on immersive Media quality evaluation with the following mandates: 1. Produce a document on VR QoE requirements; 2. Collect test material with immersive video and audio signals; 3. Study existing methods to assess human perception and reaction to VR stimuli; 4. Develop test methodology for immersive media, including simultaneous video and audio; 5. Study VR experience metrics and their measurability in VR services and devices. AhGs are open to everybody and mostly discussed using mailing lists (join here https://lists.aau.at/mailman/listinfo/immersive-quality). Interestingly, a Joint Qualinet-VQEG team on Immersive Media (JQVIM) has been recently established with similar goals and also the VR Industry Forum (VRIF) has issued a call for VR360 content. It seems there's a strong need for a dataset similar to the one we have created for MPEG-DASH long time ago.

The JQVIM has been created as part of the QUALINET task force on "Immersive Media Experiences (IMEx)" which aims at providing end users the sensation of being part of the particular media which shall result in a worthwhile, informative user and quality of experience. The main goals are providing datasets and tools (hardware/software), subjective quality evaluations, field studies, cross- validation including a strong theoretical foundation relevant along the empirical databases and tools which hopefully results in a framework, methodology, and best practices for immersive media experiences.

Common Media Application Format (CMAF)

The Final Draft International Standard (FDIS) has been issued at the 118th MPEG meeting which concludes the formal technical development process of the standard. At this point in time national bodies can only vote Yes|No and editorial changes are allowed (if any) before the International Standard (IS) becomes available. The goal of CMAF is to define a single format for the transport and storage of segmented media including audio/video formats, subtitles, and encryption -- it is derived from the ISO Base Media File Format (ISOBMFF). As it's a combination of various MPEG standard it's referred to as an Application Format (AS) which mainly takes existing formats/standards and glues them together for a specific target application. The CMAF standard clearly targets dynamic adaptive streaming (over -- but not limited to -- HTTP) but focusing on the media format only and excluding the manifest format. Thus, the CMAF standard shall be compatible with other formats such as MPEG-DASH and HLS. In fact, HLS has been extended already some time ago to support 'fragmented MP4' which we have demonstrated also and it has been interpreted as a first step towards the harmonization of MPEG-DASH and HLS; at least on the segment format. The delivery of CMAF contents with DASH will be described in part 7 of MPEG-DASH that basically comprises a mapping of CMAF concepts to DASH terms.

From a research perspective, it would be interesting to explore how certain CMAF concepts are able to address current industry needs, specifically in the context of low-latency streaming which has been demonstrated recently.

Beyond HEVC...

The preliminary call for evidence (CfE) on video compression with capability beyond HEVC has been issued and is addressed to interested parties that have technology providing better compression capability than the existing standard, either for conventional video material, or for other domains such as HDR/WCG or 360-degree ("VR") video. Test cases are defined for SDR, HDR, and 360-degree content. This call has been made jointly by ISO/IEC MPEG and ITU-T SG16/Q6 (VCEG). The evaluation of the responses is scheduled for July 2017 and depending on the outcome of the CfE, the parent bodies of the Joint Video Exploration Team (JVET) of MPEG and VCEG collaboration intend to issue a Draft Call for Proposals by the end of the July meeting.

Finally, verification tests have been conducted for the Screen Content Coding (SCC) extensions to HEVC showing exceptional performance. Screen content is video containing a significant proportion of rendered (moving or static) graphics, text, or animation rather than, or in addition to, camera-captured video scenes. For scenes containing a substantial amount of text and graphics, the tests showed a major benefit in compression capability for the new extensions over both the Advanced Video Coding standard and the previous version of the newer HEVC standard without the new SCC features.

The question whether and how new codecs like (beyond) HEVC competes with AV1 is subject to research and development. It has been discussed also in the scientific literature but lacks of vendor neutral comparison which is difficult to achieve and not to compare apples with oranges (due to the high number of different coding tools and parameters). An important aspect which always needs to be considered is one typically compares specific implementations of a coding format and not the standard as the encoding is usually not defined, only the bitstream syntax that implicitly defines the decoder.

Publicly available documents from the 118th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Torino, Italy, July 17-21, 2017. Feel free to contact us for any questions or comments.

Tuesday, May 23, 2017

The Evolution of Programming Languages and Computer Architectures over the Last 50 Years

Prof. Niklaus Wirth

June 12, 2017, 16:00
Alpen-Adria-Universität Klagenfurt, E.2.42

Please register via martina@itec.aau.at

We recount the development of procedural programming languages and of computer architectures, beginning with Algol 60 and the main frame computers, and discuss the influence of the former on the latter. We point out the major innovative features of computers, and the main characteristics of languages. What makes languages high-level, and what caused their cancerous growth and overwhelming complexity? Are we stuck with the monsters, or is further, sound development still possible?

© Peter Badge/Typos1 – in coop. with HLFF - all rights reserved 2017
Niklaus Wirth is one of the most influential computer scientists ever. He is known first of all for his works in programming language and compiler design, but he has also contributed a lot to hardware and operating system design and software engineering in a general sense. He spent most of his working time as professor at the ETH Zürich, but spent also several years in outstanding research institutions in the USA (e.g. Xerox PARC) and Canada.

His best known programming language is Pascal. Pascal was published at the end of the sixties, at a time, when on the one side widely used but theoretically poorly founded languages (such as Fortran and Cobol) and on the other hand theoretically exaggerated and practically hardly useful languages (such as Algol-68) dominated the scene. Wirth succeeded with Pascal to find the happy medium. This was the first programming language 1) incorporating the sound theory of safe programming (as defined by E.W. Dijkstra, C.A. Hoare and others, including Wirth himself); 2) applying strict, static type checking; 3) providing a flexible system of recursive type constructors. In other words: Strictness, regarding syntax, but freedom in expressing semantics. In later languages Wirth adapted the concept of encapsulation and information hiding (Modula and Modula-2), and object-orientation (Oberon and Oberon-2) in a novel, clean and simple way. Oberon was not only the name of a language, but also of an extremely compact, but extendible operating system, enabling – among others – maybe the first efficient garbage collector of the world. He designed also a hardware architecture, best fitting for the requirements of code generation from compilers (the Lilith architecture) becoming thus a pioneer for later RISC architectures. He also designed a simple and compact language for hardware design (LOLA). The leading principle in all his work was the slogan taken from Albert Einstein: “Make it as simple as possible – but not simpler!”

Niklaus Wirth published over 10 books and numerous scientific papers. He was for a few years the most quoted computer scientist at all. He received practically all awards a computer scientists can get. First of all, the Turing Award, which is often called “the Nobel prize for computer scientists”. He is a member of the order Pour le mérite for science and art and of the German Academy of Sciences, he received the IEEE Computer Pioneer Award, the Outstanding Research Award in Software Engineering von ACM Sigsoft – and a lot of others. Niklaus Wirth is an excellent speaker; humble, wise and with a lot of sense of humor. This makes his talks for an unforgettable event for this audience. The Institute of Information Technology at the Klagenfurt University is highly honored and pleased that he accepted our invitation.

Monday, May 22, 2017

TNC17 presentation: Over-the-Top Content Delivery: State of the Art and Challenges

Over-the-Top Content Delivery: State of the Art and Challenges

Christian Timmerer (AAU/Bitmovin)

Abstract: Over-the-top content delivery is becoming increasingly attractive for both live and on-demand content thanks to the popularity of platforms like YouTube, Vimeo, Netflix, Hulu, Maxdome, etc. In this tutorial, we present state of the art and challenges ahead in over-the-top content delivery. In particular, the goal of this tutorial is to provide an overview of adaptive media delivery, specifically in the context of HTTP adaptive streaming (HAS) including the recently ratified MPEG-DASH standard. The main focus of the tutorial will be on the common problems in HAS deployments such as client design, QoE optimization, multi-screen and hybrid delivery scenarios, and synchronization issues. For each problem, we will examine proposed solutions along with their pros and cons. In the last part of the tutorial, we will look into the open issues and review the work-in-progress and future research directions.

TNC17 Networking Conference - The Art of Creative Networkinghttps://tnc17.geant.org/

Please feel free to contact me for details and/or I'd be happy to meet you at TNC17.

Friday, May 19, 2017

QoMEX'17 Special Session 2: Down the Rabbit Hole – General Aspects of VR and the Immersion Experience

QoMEX 2017
May 31 – June 2, 2017 in Erfurt, Germany

Picture from http://www.chalquist.com/fantastophobia.html


  • Raimund Schatz, Austrian Institute of Technology (AIT), Austria
  • Christian Timmerer, Alpen-Adria-Universität (AAU) Klagenfurt, Austria
  • Judith Redi, Delft University of Technology, The Netherlands


Currently, we witness a proliferation of new products and applications based on immersive media technologies – exemplified best by the current “VR hype” causing a flurry of new devices and applications to hit the market place. The potential of immersive media is large, however, investigation and application of QoE in this context is still in its infancy as many issues are not yet understood. Consequently, the multimedia-quality community faces new, rapidly moving research targets, resulting in the following overarching questions:
  • What is the interplay between concepts like Immersion, Presence, Interactivity, Multimedia Quality and User Experience in the context of emerging immersive applications and technologies?
  • What should be the role of QoE in this domain?
  • How to characterize, assess, model and manage QoE for immersive media-rich applications?
This special session aims to bring together researchers and practitioners to present and discuss quality-related issues and topics in the field of immersive media. The goal is to highlight the major challenges the multimedia quality community should target in this dynamically evolving domain and identify ways forward to addressing them effectively.

All QoMEX'17 special sessions can be found here.

Schedule & Format

Thursday, June 1, 2017, 13:00 -- 15:00
  1. Chenyan Zhang, Andrew Perkis and Sebastian Arndt, Spatial Immersion versus Emotional Immersion, Which is More Immersive?
  2. Conor Keighrey, Ronan Flynn, Siobhan Murray and Niall Murray, A QoE Evaluation of Augmented and Immersive Virtual Reality Speech & Language Assessment Applications 
  3. Raimund Schatz, Andreas Sackl, Christian Timmerer and Bruno Gardlo, Towards Subjective Quality of Experience Assessment for Omnidirectional Video Streaming
  4. Ashutosh Singla, Stephan Fremerey, Werner Robitza and Alexander Raake, Measuring and Comparing QoE and Simulator Sickness of Omnidirectional Videos in Different Head Mounted Displays
  5. Yashas Rai, Patrick Le Callet and Philippe Guillotel, Which saliency weighting for omni directional image quality assessment?
  6. Evgeniy Upenik, Martin Rerabek and Touradj Ebrahimi, On the Performance of Objective Metrics for Omnidirectional Visual Content
The special session format will be as follows:
  • 15min slots per presentation (12-13min talk, 1min for 1 question, 1-2 min time for speaker change)
  • 30min for panel discussion

Panel Discussion

For the panel discussion, our aim is to address the following questions:
  1. What is your understanding of a fully immersive experience (provide a definition or propose keywords to characterize immersive experiences)?
  2. Which aspects should the QoMEX community focus on?
  3. What can we learn and which knowledge can we re-use from the 3D and HDR research experiences?
... and we'd like to solicit also input from the community at large. Therefore, we setup a shared Google doc which is available here, asking for YOUR input: http://bit.ly/QoMEXSpS2.

Come and join us on the journey down the rabbit hole which eventually will lead to wonderland.

Monday, May 8, 2017

Joint QUALINET-VQEG team on Immersive Media (JQVIM)

QUALINET and VQEG have an ambition to increase the collaboration between the organizations. As a pilot effort we are hereby proposing a Joint Qualinet-VQEG team on Immersive Media (JQVIM). The actual collaboration will be between the Task Force: "Immersive Media Experiences (IMEx)" of Qualinet and the Working Group "Immersive Media Group (IMG)" of VQEG.

The initial goals for JQVIM are:
  • Collecting and producing open source immersive media content and data set
  • Establishing and recommending best practices and guidelines
  • Collecting and producing open source immersive media tools
  • Survey of standardisation activities
Anybody who's interested joining this effort, please contact me. Interestingly, this effort is related to my previous post where the VR Industry Forum (VRIF) calls for VR360 Content and also MPEG established an Ad-Hoc Group (AhG) on Immersive Media Quality Evaluation with similar mandates.

Friday, May 5, 2017

VR Industry Forum (VRIF) calls for VR360 Content

1 Introduction

The VR Industry Forum (VRIF) is a cross-industry Forum that has as its purpose “to further the widespread availability of high quality audiovisual VR experiences, for the benefit of consumers”. VRIF builds on standards created by formal Standards Development Organizations, such as MPEG, and seeks to use these standards to enable the interoperable deployment of high-quality VR360 services.

VRIF’s initial focus is on Three Degrees-of-Freedom (3-DoF) video and audio. VRIF is now calling for content, to build a content library with material for the purpose of providing public test vectors that may be used by content providers, service providers as reference, and by manufacturers of applications and devices to test implementations against the VRIF guidelines. VRIF’s hope is to build a library of content that can be widely used by Industry to test and promote VR services.

VRIF prefers to receive VR360 content accompanied with 3D spatial audio. If you are willing to contribute other forms of content that may be relevant to VRIF, please contact us.

2 License

VRIF calls for content with as few restrictions as few restrictions as possible. It must be possible to use the content for the testing purposes within VRIF, and for demonstration at private and public events by VRIF members. It is also highly desirable for the content to available for general research, development, and for demonstration of audio/visual or image signal processing technology. It must be possible to extract single frame images from the content for inclusion in technical publications.

VRIF prefers content that is licensed a Creative Commons License as documented here: https://creativecommons.org/share-your-work/licensing-types-examples/

If you are considering making content available but would like to impose a few specific restrictions, then VRIF is willing to consider such restrictions as long as these are consistent with VRIF’s intended use.

3 Use Case

VRIF develops use cases that drive our Guidelines. The relevant aspects of the current use case are provided in this Section 3. The derived requirements for the test material are provided in Section 4.

A service provider offers a library of 360 A/V content. The library is a mixture of content formats from user generated content, professionally generated studio content, VR documentaries, promotional videos, as well as highlights of sports events. The content enables changing the field-of-view based on user interaction.

The service provider wants to create a portal to distribute the content to a multitude of devices that support 360-A/V and VR processing and rendering. The service provider wants to target two types of applications:
  • Primarily, viewing in an HMD with head motion tracking. 
  • Additionally, the content provider may enable viewing on a “flat screen” with the user selecting the field-of-view through manual interaction (e.g. mouse input or swiping). 
The service provider expects different types of consumption and rendering devices with different capabilities in terms of decoding and rendering. The service provider has access to the original footage of the content and is permitted to encode and transcode to appropriate distribution formats.

The footage includes different types of 360 A/V VR content, such as: 

For video: One of the three
  • Pre-stitched monoscopic video, i.e. a (360 and possibly less than 360) spherical video without depth perception, with Equirectangular Projection (ERP).
  • Pre-stitched stereoscopic video, i.e., a spherical video using a separate input for each eye, typically with ERP. 
  • Fish-eye content, typically user-generated 
For video: Original content
  • original content, either in on original uncompressed domain or in a high-quality mezzanine format.
  • Basic VR content: 4k x 2k in equirectangular projection (ERP), 8 or 10bit, BT.709, 30fps and up.
  • High-quality content: 8k x 4k (ERP), 10 bit, possibly advanced transfer characteristics and color transforms, sufficiently high frame rates, etc. 
Sufficient metadata is provided to appropriately describe the A/V content

For audio
  • Spatial audio content for immersive experiences:
    • Channel-based audio
    • Object-based audio
    • Scene-based audio
    • Or a combination of the above 
  • Sufficient metadata for encoding, decoding and rendering the spatial audio scene permitting dynamic interaction with the content. This may include additional metadata that is also used in regular TV applications, such as for loudness management. 
  • Diegetic and non-diegetic audio content.

4 Test Material Requirements 

We are seeking content with the following characteristics:
  • Sequences have zero or few issues in the original form (stitching, noise)
  • Content: 
    • Basic Video VR content: approximately 4k x 2k (ERP, 8 or 10bit, BT.709, as low as 25/30fps, but also 50/60 fps
    • High-quality Video Content: approximately 6k x 3k, 8k x 4k and up (ERP), 10 bit, possibly advanced transfer characteristics and colour transforms, possibly even higher frame rates, etc.
    • Monoscopic or Stereoscopic
  • Audio along with this:
    • preferably 3D spatial audio, timely synced and spatially aligned with the video provided in the following formats:
      • Channel-based audio 
      • Object-based audio
      • Scene-based audio
      • Or a combination of the above
    • Sufficient metadata for encoding, decoding and rendering the spatial audio scene permitting dynamic interaction with the content. The metadata may include additional metadata that is also used in regular TV applications, such as for loudness management. 
    • Diegetic and non-diegetic audio content.
  • Duration: between 30 seconds and 2 minutes.
  • Type of content:
    • Sports
    • Live events (e.g. music / concerts o Outdoor scenery (nature or urban)
    • professionally produced indoor
  • Artistic characteristics:
    • natural and synthetically generated (but still coded as video)
    • moving or static ROI
    • preference for fixed camera; optionally moving camera
  • Packaging
    • Video: Raw or lightly compressed mezzanine format (to be worked out)
    • Audio: uncompressed produced Audio assets, or lightly compressed 

If you have content that does not meet all requirements, please get in touch as we are interested in understanding if would still be useful our purposes.

5 Credits

VRIF is happy to acknowledge sponsors and contributors of the content by providing credits, in one or more of the following ways:
  • on the VRIF website, 
  • along with the hosting (e.g., the download page) 
  • modestly embedded in the content itself, in a way that doesn’t detract from that content. 

6 Contacts

For questions or to respond to this call, please contact:
  • Rob Koenen: rob.koenen@tno.nl
  • Thomas Stockhammer: tsto@qti.qualcomm.com 
  • VR Industry Forum: info@vr-if.org