Saturday, April 1, 2017

VR/360/Immersive Media (Streaming) Standardization-Related Activities

April 1, 2017 (1st version), September 14, 2017 (2nd version), May 22, 2018 (3rd version)

Universal media access (UMA) as proposed in the late 90s, early 2000 is now reality. It is very easy to generate, distribute, share, and consume any media content, anywhere, anytime, and with/on any device. These kind of real-time entertainment services — specifically, streaming audio and video — are typically deployed over the open, unmanaged Internet and account now for more than 70% of the evening traffic in North American fixed access networks. It is assumed that this number will reach 80% by the end of 2020. A major technical breakthrough and enabler was certainly the adaptive streaming over HTTP resulting in the standardization of MPEG-DASH.

One of the next big things in adaptive media streaming is most likely related to virtual reality (VR) applications and, specifically, omnidirectional (360-degree) media streaming, which is currently built on top of the existing adaptive streaming ecosystems. The major interfaces of such an ecosystem are shown below and are described in a Bitmovin blog post some time ago (note: it has now evolved to Immersive Media referred to as MPEG-I).


Omnidirectional video (ODV) content allows the user to change her/his viewing direction in multiple directions while consuming the video, resulting in a more immersive experience than consuming traditional video content with a fixed viewing direction. Such video content can be consumed using different devices ranging from smart phones and desktop computers to special head-mounted displays (HMD) like Oculus Rift, Samsung Gear VR, HTC Vive, etc. When using a HMD to watch such a content, the viewing direction can be changed by head movements. On smart phones and tablets, the viewing direction can be changed by touch interaction or by moving the device around thanks to built-in sensors. On a desktop computer, the mouse or keyboard can be used for interacting with the omnidirectional video.

The streaming of ODV content is currently deployed in a naive way by simply streaming the entire 360-degree scene/view in constant quality without exploiting and optimizing the quality for the user’s viewport.

There are several standardization-related activities ongoing which I'd like to highlight in this blog post. I started with streaming-related aspects but now also cover general aspects related to immersive media applications.

The VR industry forum has been established with the aim "to further the widespread availability of high quality audiovisual VR experiences, for the benefit of consumers" comprising working groups related to requirements, guidelines, interoperability, communications, and liaison. VRIF published guidelines at CES 2018, and these are available here: The initial release of the VRIF Guidelines focuses on the delivery ecosystem of 360° video with three degrees of freedom (3DOF) and incorporates:
  • Documentation of cross-industry interoperability points, based on ISO MPEG’s Omnidirectional Media Format (OMAF)
  • Best industry practices for production of VR360 content, with an emphasis on human factors such as motion sickness
  • Security considerations for VR360 streaming, focusing on content protection but also looking at user privacy.
Topics to be addressed in 2018 by VRIF include live virtual reality services and support for high dynamic range (HDR).

QUALINET is a European network concerned about Quality of Experience (QoE) in multimedia systems and services. In terms of VR/360 it runs a task force about "Immersive Media Experiences (IMEx)" where everyone is invited to contribute. QUALINET also coordinates standardization activities in this area. It can help organizing and conducting formal QoE assessments in various domains. For example, it has conducted various experiments during development of MPEG-H High Efficiency Video Coding (HEVC). It recently established a Joint Qualinet-VQEG team on Immersive Media (JQVIM) -- together with VQEG (see also below) -- and everyone is welcome to join (details can be found here).

JPEG started an initiative called Pleno focusing on images. At the 76th JPEG meeting in Turin, Italy, responses to the call for proposals for JPEG Pleno light field image coding were evaluated using subjective and objective evaluation metrics, and a Generic JPEG Pleno Light Field Architecture was created. The JPEG committee defined three initial core experiments to be performed before the 77th JPEG meeting in Macau, China. Additionally, the JPEG XS requirements document references VR applications and JPEG recently created an AhG on JPEG360 with the mandates to collect and define use cases for 360 degree image capture applications, develop requirements for such use cases, solicit industry engagement, collect evidence of existing solutions, and update description of needed metadata.

In terms of MPEG, I've previously reported about MPEG-I as part of my MPEG report (also here for most recent MPEG report) which currently includes five parts. The first part will be a technical report describing the scope of this new standard and a set of use cases and applications from which actual requirements can be derived. Technical reports are usually publicly available for free. The second part specifies the omnidirectional media application format (OMAF) addressing the urgent need of the industry for a standard is this area. Part three will address immersive video and part four defines immersive audio. Finally, part five will contain a specification for point cloud compression for which a call for proposals is currently available. OMAF is part of a first phase of standards related to immersive media and FDIS is available already. OMAF 2nd edition started with the goal to support 3DoF+ and social VR. MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively. Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.

The Spatial Relationship Descriptor (SRD) of the MPEG-DASH standard provides means to describe how the media content is organized in the spatial domain. In particular, the SRD is fully integrated in the media presentation description (MPD) of MPEG-DASH and is used to describe a grid of rectangular tiles which allows a client implementation to request only a given region of interest — typically associated to a contiguous set of tiles. Interestingly, the SRD has been developed before OMAF and how SRD is used with OMAF is currently subject to standardization. Speaking about MPEG-DASH, There has been a presentation at 3GPP/VRIF workshop which is available here.

In 3GPP, the TSG SA WG4 (3GPP SA4) Video SWG deals with a Rel-15 work item on VR Streaming and a Rel-16 Study Item on QoE Reports for VR Streaming. During SA4#98 in April 2018, the following happened:
  • Significant progress on Video Operation Points, in total 4 are defined: 2 of them are legacy (ERP and mono only), 2 of them are using (MPEG/OMAF) features for viewport-dependent and stereo distribution.
  • Progress on the media profiles for video by with 5 Media Profiles, 4 directly enabling distribution of the operation points with File Format and DASH, and one media profile for tile-based streaming.
  • On audio, the audio architecture was updated to include different Renderers as well as the relevant APIs. An audio profile submission process was agreed in S4-180629 with some remaining issues to be clarified in telcos enabling the characterization of audio technologies in a VR environment including head tracking and binaurilization.
  • An updated TS26.118v0.5.0 is produced including all agreements of the meeting in S4-180559 and will be sent to SA plenary for information.
  • The work is expected to be completed in the Rel-15 time plan and is expected to be completed by Sep 2018.
  • On the QoE metrics the latest information can be found in TR26.926v0.3.0 in S4-180560.
IEEE has started IEEE P2048 and here specifically "P2048.2 Standard for Virtual Reality and Augmented Reality: Immersive Video Taxonomy and Quality Metrics" -- to define different categories and levels of immersive video -- and "P2048.3 Standard for Virtual Reality and Augmented Reality: Immersive Video File and Stream Formats" -- to define formats of immersive video files and streams, and the functions and interactions enabled by the formats -- but not much material is available right now. However, P2048.2 seems to be related to QUALINET and P2048.3 could definitely benefit from what MPEG has done and is still doing (incl. also, e.g., MPEG-V). Additionally, there's IEEE P3333.3 defining a standard for HMD based 3D content motion sickness reducing technology to resolve VR sickness caused by the visual mechanism set by the HMD-based 3D content motion sickness through the study of i) visual response to the focal distortion, ii) visual response to the lens materials, iii) visual response to the lens refraction ratio, and iv) visual response to the frame rate.

The ITU-T started a new work program referred to as "G.QoE-VR” after successfully finalizing P.NATS which is now called P.1203. However, there are no details about "G.QoE-VR” publicly available yet, just found this here. According to @slhck, G.QoE-VR will generally focus on HMD-based VR streaming, investigation of subjective test methodologies and, later, instrumental QoE models. This also confirmed here with expected deliverables from this study group, namely recommendations on QoE factors and requirements for VR, subjective test methodologies for assessing VR quality, and objective quality estimation model(s) for VR services. In this context, it's worth to mention the Video Quality Experts Group (VQEG) which has a Immersive Media Group (IMG) with the mission on "quality assessment of immersive media, including virtual reality, augmented reality, stereoscopic 3DTV, multiview". IMG is also involved in JQVIM introduced above.

Finally, the Khronos group presented at the 3GPP/VRIF workshop which is accessible here and an overview of OpenXR (March’18) can be found here. The Khronos group announced a VR standards initiative which resulted into OpenXR (Cross-Platform, Portable, Virtual Reality) defining an APIs for VR and AR applications. Further information is available here: https://www.khronos.org/openxr. OpenXR defines two levels of API interfaces that a VR platform’s runtime can use to access the OpenXR ecosystem:
  • Apps and engines use standardized interfaces to interrogate and drive devices. Devices can self-integrate to a standardized driver interface.
  • Standardized hardware/software interfaces reduce fragmentation while leaving implementation details open to encourage industry innovation.
In this context, the WebVR already defines an API which provides support for accessing virtual reality devices, including sensors and head-mounted displays on the web. Link: https://webvr.info/ which includes a link to “WebVR, Editor’s Draft, 12 December 2017”. Important note: “Development of the WebVR API has halted in favor of being replaced the WebXR Device API. Several browsers will continue to support this version of the API in the meantime.”

The WebXR Device API Specification provides interfaces to VR and AR hardware to allow developers to build compelling, comfortable VR/AR experiences on the web. It is intended to completely replace the legacy WebVR specification when finalized. In the meantime, multiple browsers will continue to expose the older API. The latest “WebXR Device API, Editor’s Draft, 11 January 2018” and provides an interface to VR/AR hardware.

Additionally, there has been a presentation at 3GPP/VRIF workshop which is accessible here and provides a rough overview about W3C and W3C Immersive Web: Virtual and Augmented Reality.

DVB following the conclusions of the DVB Virtual Reality Study mission (summary can be found here), the DVB VR activity is promoted from a commercial module (CM) study mission on VR (CM-VR-SMG) to a CM-VR official group as approved by the DVB CM on June 28th, 2017. The overall goal for the CM-VR is at delivering commercial requirements to be passed to the relevant DVB technical module (TM) groups in order to work on developing technical specifications targeting the delivery of VR contents over DVB networks, as mandated per the DVB CM. A report on DVB and VR is available here including some conclusions, commercial success factors, technical aspects (i.e., frame rates, bit rates, FoV, resolution, geometrical congruency, degree of audio/visual immersion, head tracking latency) at the end. Additionally, points out AR at the very end. Presentation at 3GPP/VRIF workshop is available here.

CTA, there has been a presentation at 3GPP/VRIF workshop which is available here. The last slide mentions the creation of a first standards WG on AR/VR technology in May 2016. Web site is available here which provides:
IETF/IRTF, we found little activity related to immersive media within IETF/IRTF except https://tools.ietf.org/html/draft-han-iccrg-arvr-transport-problem-01 which expired in September 2017.

SMPTE VR/AR Study Group has been created Feb 28, 2018 to study the current and projected needs for standardized approaches to capture and post produce images and sound to create a distribution master for Virtual Reality (VR) and Augmented Reality (AR) distribution and display systems. The goal is to study where possible standardization could be applied and to make recommendations which are to be all included into an Engineering Report to be published.

Project Overview
  • Problem to be solved: There are many different capture methods, file formats, display systems and post production methods for VR and AR content. The problem for the group to solve is to identify if there is a need to standardize any of these methods so that easier interchange can be more easily accomplished. Once this study of the ecosystem is completed then the project will consolidate the findings and any recommendations into an Engineering Report.
  • Project scope: Study the current VR and AR ecosystem for production and post production workflows and create a report documenting the current ecosystem, relevant existing standards and recommendations of new standards, recommended practices or engineering guidelines.
  • Specific tasks: Explore the current VR and AR ecosystems and document that for the report. Investigate the needs in the industry for standardization of aspects of the production, processing and post production to create a VR/AR distribution master. Investigate if there are existing standards for production, post production of VR/AR content and document them for the report. Do the gap analysis between existing and required standardization for the production and post production of VR/AR content. Make recommendations for future standards and work required for the production and post production of VR/AR content to create a distribution master. 
Additionally, we found this section meeting which also provides some links to the presentations.

ETSI launched new group on augmented reality, specifically a Augmented Reality Framework Industry Specification Group (ARF ISG) which can be found here. “In this initial phase of work the ARF ISG is interested in hearing from the industry about AR industrial use cases, obstacles encountered when deploying (pilot) AR services and requirements for interoperability.”

The Streaming Video Alliance (SVA) formed a Study Group on Virtual Reality/360 Degree Video late in 2016. Their current work is to document the relevant technologies and experiences in the 360-degree video market, and it is expected that their report will be published in May 2018. In addition, the SVA is looking to organize its second Proof of Concept for later in 2018. In addition to evaluating CDN performance for traditional video services, SVA are looking to include VR360 content in order to understand latency factors and CDN impacts on 360-degree delivery.

If you think I missed something, please let me know and I'm happy to include it / update this blog post.
Post a Comment