Multimedia Communication: standardization

Showing posts with label standardization. Show all posts

Saturday, April 1, 2017

VR/360/Immersive Media (Streaming) Standardization-Related Activities

April 1, 2017 (1st version), September 14, 2017 (2nd version), May 22, 2018 (3rd version)

Universal media access (UMA) as proposed in the late 90s, early 2000 is now reality. It is very easy to generate, distribute, share, and consume any media content, anywhere, anytime, and with/on any device. These kind of real-time entertainment services — specifically, streaming audio and video — are typically deployed over the open, unmanaged Internet and account now for more than 70% of the evening traffic in North American fixed access networks. It is assumed that this number will reach 80% by the end of 2020. A major technical breakthrough and enabler was certainly the adaptive streaming over HTTP resulting in the standardization of MPEG-DASH.

One of the next big things in adaptive media streaming is most likely related to virtual reality (VR) applications and, specifically, omnidirectional (360-degree) media streaming, which is currently built on top of the existing adaptive streaming ecosystems. The major interfaces of such an ecosystem are shown below and are described in a Bitmovin blog post some time ago (note: it has now evolved to Immersive Media referred to as MPEG-I).

Omnidirectional video (ODV) content allows the user to change her/his viewing direction in multiple directions while consuming the video, resulting in a more immersive experience than consuming traditional video content with a fixed viewing direction. Such video content can be consumed using different devices ranging from smart phones and desktop computers to special head-mounted displays (HMD) like Oculus Rift, Samsung Gear VR, HTC Vive, etc. When using a HMD to watch such a content, the viewing direction can be changed by head movements. On smart phones and tablets, the viewing direction can be changed by touch interaction or by moving the device around thanks to built-in sensors. On a desktop computer, the mouse or keyboard can be used for interacting with the omnidirectional video.

The streaming of ODV content is currently deployed in a naive way by simply streaming the entire 360-degree scene/view in constant quality without exploiting and optimizing the quality for the user’s viewport.

There are several standardization-related activities ongoing which I'd like to highlight in this blog post. I started with streaming-related aspects but now also cover general aspects related to immersive media applications.

The VR industry forum has been established with the aim "to further the widespread availability of high quality audiovisual VR experiences, for the benefit of consumers" comprising working groups related to requirements, guidelines, interoperability, communications, and liaison. VRIF published guidelines at CES 2018, and these are available here: The initial release of the VRIF Guidelines focuses on the delivery ecosystem of 360° video with three degrees of freedom (3DOF) and incorporates:

Documentation of cross-industry interoperability points, based on ISO MPEG’s Omnidirectional Media Format (OMAF)
Best industry practices for production of VR360 content, with an emphasis on human factors such as motion sickness
Security considerations for VR360 streaming, focusing on content protection but also looking at user privacy.

Topics to be addressed in 2018 by VRIF include live virtual reality services and support for high dynamic range (HDR).

QUALINET is a European network concerned about Quality of Experience (QoE) in multimedia systems and services. In terms of VR/360 it runs a task force about "Immersive Media Experiences (IMEx)" where everyone is invited to contribute. QUALINET also coordinates standardization activities in this area. It can help organizing and conducting formal QoE assessments in various domains. For example, it has conducted various experiments during development of MPEG-H High Efficiency Video Coding (HEVC). It recently established a Joint Qualinet-VQEG team on Immersive Media (JQVIM) -- together with VQEG (see also below) -- and everyone is welcome to join (details can be found here).

JPEG started an initiative called Pleno focusing on images. At the 76th JPEG meeting in Turin, Italy, responses to the call for proposals for JPEG Pleno light field image coding were evaluated using subjective and objective evaluation metrics, and a Generic JPEG Pleno Light Field Architecture was created. The JPEG committee defined three initial core experiments to be performed before the 77th JPEG meeting in Macau, China. Additionally, the JPEG XS requirements document references VR applications and JPEG recently created an AhG on JPEG360 with the mandates to collect and define use cases for 360 degree image capture applications, develop requirements for such use cases, solicit industry engagement, collect evidence of existing solutions, and update description of needed metadata.

In terms of MPEG, I've previously reported about MPEG-I as part of my MPEG report (also here for most recent MPEG report) which currently includes five parts. The first part will be a technical report describing the scope of this new standard and a set of use cases and applications from which actual requirements can be derived. Technical reports are usually publicly available for free. The second part specifies the omnidirectional media application format (OMAF) addressing the urgent need of the industry for a standard is this area. Part three will address immersive video and part four defines immersive audio. Finally, part five will contain a specification for point cloud compression for which a call for proposals is currently available. OMAF is part of a first phase of standards related to immersive media and FDIS is available already. OMAF 2nd edition started with the goal to support 3DoF+ and social VR. MPEG-I architectures (yes plural) are becoming mature and I think this technical report will become available very soon. In terms of video, MPEG-I looks more closer at 3DoF+ defining common test conditions and a call for proposals (CfP) planned for MPEG123 in Ljubljana, Slovenia. Additionally, explorations for 6DoF and compression of dense representation of light fields are ongoing and have been started, respectively. Finally, point cloud compression (PCC) is in its hot phase of core experiments for various coding tools resulting into updated versions of the test model and working draft.

The Spatial Relationship Descriptor (SRD) of the MPEG-DASH standard provides means to describe how the media content is organized in the spatial domain. In particular, the SRD is fully integrated in the media presentation description (MPD) of MPEG-DASH and is used to describe a grid of rectangular tiles which allows a client implementation to request only a given region of interest — typically associated to a contiguous set of tiles. Interestingly, the SRD has been developed before OMAF and how SRD is used with OMAF is currently subject to standardization. Speaking about MPEG-DASH, There has been a presentation at 3GPP/VRIF workshop which is available here.

In 3GPP, the TSG SA WG4 (3GPP SA4) Video SWG deals with a Rel-15 work item on VR Streaming and a Rel-16 Study Item on QoE Reports for VR Streaming. During SA4#98 in April 2018, the following happened:

Significant progress on Video Operation Points, in total 4 are defined: 2 of them are legacy (ERP and mono only), 2 of them are using (MPEG/OMAF) features for viewport-dependent and stereo distribution.
Progress on the media profiles for video by with 5 Media Profiles, 4 directly enabling distribution of the operation points with File Format and DASH, and one media profile for tile-based streaming.
On audio, the audio architecture was updated to include different Renderers as well as the relevant APIs. An audio profile submission process was agreed in S4-180629 with some remaining issues to be clarified in telcos enabling the characterization of audio technologies in a VR environment including head tracking and binaurilization.
An updated TS26.118v0.5.0 is produced including all agreements of the meeting in S4-180559 and will be sent to SA plenary for information.
The work is expected to be completed in the Rel-15 time plan and is expected to be completed by Sep 2018.
On the QoE metrics the latest information can be found in TR26.926v0.3.0 in S4-180560.

IEEE has started IEEE P2048 and here specifically "P2048.2 Standard for Virtual Reality and Augmented Reality: Immersive Video Taxonomy and Quality Metrics" -- to define different categories and levels of immersive video -- and "P2048.3 Standard for Virtual Reality and Augmented Reality: Immersive Video File and Stream Formats" -- to define formats of immersive video files and streams, and the functions and interactions enabled by the formats -- but not much material is available right now. However, P2048.2 seems to be related to QUALINET and P2048.3 could definitely benefit from what MPEG has done and is still doing (incl. also, e.g., MPEG-V). Additionally, there's IEEE P3333.3 defining a standard for HMD based 3D content motion sickness reducing technology to resolve VR sickness caused by the visual mechanism set by the HMD-based 3D content motion sickness through the study of i) visual response to the focal distortion, ii) visual response to the lens materials, iii) visual response to the lens refraction ratio, and iv) visual response to the frame rate.

The ITU-T started a new work program referred to as "G.QoE-VR” after successfully finalizing P.NATS which is now called P.1203. However, there are no details about "G.QoE-VR” publicly available yet, just found this here. According to @slhck, G.QoE-VR will generally focus on HMD-based VR streaming, investigation of subjective test methodologies and, later, instrumental QoE models. This also confirmed here with expected deliverables from this study group, namely recommendations on QoE factors and requirements for VR, subjective test methodologies for assessing VR quality, and objective quality estimation model(s) for VR services. In this context, it's worth to mention the Video Quality Experts Group (VQEG) which has a Immersive Media Group (IMG) with the mission on "quality assessment of immersive media, including virtual reality, augmented reality, stereoscopic 3DTV, multiview". IMG is also involved in JQVIM introduced above.

Finally, the Khronos group presented at the 3GPP/VRIF workshop which is accessible here and an overview of OpenXR (March’18) can be found here. The Khronos group announced a VR standards initiative which resulted into OpenXR (Cross-Platform, Portable, Virtual Reality) defining an APIs for VR and AR applications. Further information is available here: https://www.khronos.org/openxr. OpenXR defines two levels of API interfaces that a VR platform’s runtime can use to access the OpenXR ecosystem:

Apps and engines use standardized interfaces to interrogate and drive devices. Devices can self-integrate to a standardized driver interface.
Standardized hardware/software interfaces reduce fragmentation while leaving implementation details open to encourage industry innovation.

In this context, the WebVR already defines an API which provides support for accessing virtual reality devices, including sensors and head-mounted displays on the web. Link: https://webvr.info/ which includes a link to “WebVR, Editor’s Draft, 12 December 2017”. Important note: “Development of the WebVR API has halted in favor of being replaced the WebXR Device API. Several browsers will continue to support this version of the API in the meantime.”

The WebXR Device API Specification provides interfaces to VR and AR hardware to allow developers to build compelling, comfortable VR/AR experiences on the web. It is intended to completely replace the legacy WebVR specification when finalized. In the meantime, multiple browsers will continue to expose the older API. The latest “WebXR Device API, Editor’s Draft, 11 January 2018” and provides an interface to VR/AR hardware.

Additionally, there has been a presentation at 3GPP/VRIF workshop which is accessible here and provides a rough overview about W3C and W3C Immersive Web: Virtual and Augmented Reality.

DVB, following the conclusions of the DVB Virtual Reality Study mission (summary can be found here), the DVB VR activity is promoted from a commercial module (CM) study mission on VR (CM-VR-SMG) to a CM-VR official group as approved by the DVB CM on June 28th, 2017. The overall goal for the CM-VR is at delivering commercial requirements to be passed to the relevant DVB technical module (TM) groups in order to work on developing technical specifications targeting the delivery of VR contents over DVB networks, as mandated per the DVB CM. A report on DVB and VR is available here including some conclusions, commercial success factors, technical aspects (i.e., frame rates, bit rates, FoV, resolution, geometrical congruency, degree of audio/visual immersion, head tracking latency) at the end. Additionally, points out AR at the very end. Presentation at 3GPP/VRIF workshop is available here.

CTA, there has been a presentation at 3GPP/VRIF workshop which is available here. The last slide mentions the creation of a first standards WG on AR/VR technology in May 2016. Web site is available here which provides:

Virtual Reality - Reality Check: Consumer Experience and Expectations. This study explores U.S. consumers’ experiences with virtual reality to get an in-depth understanding of what content consumers are exposed to and their content preferences.
Consumer Sentiments: Virtual Reality In-Store Demonstrations: VR Headset and Content. This study will aide as a guide for industry on content development and distribution strategies for video content and the future of VR products.
Augmented Reality and Virtual Reality: Consumer Sentiments. Augmented Reality (AR) and Virtual Reality (VR): Consumer Sentiments report identifies consumer awareness and perceptions of AR and VR technologies and its various use cases.

IETF/IRTF, we found little activity related to immersive media within IETF/IRTF except https://tools.ietf.org/html/draft-han-iccrg-arvr-transport-problem-01 which expired in September 2017.

SMPTE VR/AR Study Group has been created Feb 28, 2018 to study the current and projected needs for standardized approaches to capture and post produce images and sound to create a distribution master for Virtual Reality (VR) and Augmented Reality (AR) distribution and display systems. The goal is to study where possible standardization could be applied and to make recommendations which are to be all included into an Engineering Report to be published.

Project Overview

Problem to be solved: There are many different capture methods, file formats, display systems and post production methods for VR and AR content. The problem for the group to solve is to identify if there is a need to standardize any of these methods so that easier interchange can be more easily accomplished. Once this study of the ecosystem is completed then the project will consolidate the findings and any recommendations into an Engineering Report.
Project scope: Study the current VR and AR ecosystem for production and post production workflows and create a report documenting the current ecosystem, relevant existing standards and recommendations of new standards, recommended practices or engineering guidelines.
Specific tasks: Explore the current VR and AR ecosystems and document that for the report. Investigate the needs in the industry for standardization of aspects of the production, processing and post production to create a VR/AR distribution master. Investigate if there are existing standards for production, post production of VR/AR content and document them for the report. Do the gap analysis between existing and required standardization for the production and post production of VR/AR content. Make recommendations for future standards and work required for the production and post production of VR/AR content to create a distribution master.

Additionally, we found this section meeting which also provides some links to the presentations.

ETSI launched new group on augmented reality, specifically a Augmented Reality Framework Industry Specification Group (ARF ISG) which can be found here. “In this initial phase of work the ARF ISG is interested in hearing from the industry about AR industrial use cases, obstacles encountered when deploying (pilot) AR services and requirements for interoperability.”

The Streaming Video Alliance (SVA) formed a Study Group on Virtual Reality/360 Degree Video late in 2016. Their current work is to document the relevant technologies and experiences in the 360-degree video market, and it is expected that their report will be published in May 2018. In addition, the SVA is looking to organize its second Proof of Concept for later in 2018. In addition to evaluating CDN performance for traditional video services, SVA are looking to include VR360 content in order to understand latency factors and CDN impacts on 360-degree delivery.

If you think I missed something, please let me know and I'm happy to include it / update this blog post.

Monday, January 3, 2011

Watching Video over the Web

... to start 2011 with some interesting articles, I'd like to share with you two articles from IEEE Internet Computing entitled "Watching Video over the Web" written by Ali Begen, Tankut Akgul, and Mark Baugher.

Part I: Streaming Protocols

Abstract: A U.S. consumer watches TV for about five hours a day on average. While the majority of viewed content is still the broadcast TV programming, the share of the time‐shifted content has been ever increasing. One third of the U.S. consumers currently use a digital video recorder (DVR)‐like device for time‐shifting, however, the trends are showing that more and more consumers are going to the Web to watch their favorite shows and movies on a computer or mobile device. Increasingly, the Web is coming to the digital TV, which incorporates movie downloads and streaming using Web protocols. In this first part of a two‐part article, the authors describe both conventional and emerging streaming solutions using Web and non‐Web protocols and provide a detailed comparison.

Citation: Ali Begen, Tankut Akgul, Mark Baugher, "Watching Video over the Web, Part I: Streaming Protocols," IEEE Internet Computing, 22 Dec. 2010. IEEE computer Society Digital Library. IEEE Computer Society, http://doi.ieeecomputersociety.org/10.1109/MIC.2010.155

Part II: Applications, Standardization and Open Issues

Abstract: In this second part of a two-part article, the authors look into applications for streaming including end-to-end mobile and in-home streaming, contrasting adaptive approaches to other video delivery paradigms, discuss the current standardization efforts and highlight the areas that still require further research and investigation.

Citation: Ali Begen, Tankut Akgul, Mark Baugher, "Watching Video over the Web, Part II: Applications, Standardization and Open Issues," IEEE Internet Computing, 22 Dec. 2010. IEEE computer Society Digital Library. IEEE Computer Society, http://doi.ieeecomputersociety.org/10.1109/MIC.2010.156

Wednesday, March 17, 2010

O Universal Multimedia Access, Where Art Thou? (Part IV)

-by Christian Timmerer, Klagenfurt University, Austria

Preface: First I thought about writing this article for a journal or something equivalent but then I concluded to make this article available through my blog. The aim is to perform an experiment in order to determine whether it is possible (a) to get direct feedback through comments and (b) to be referenced from elsewhere. As it is a quite comprehensive article, it’s split up in separate parts. If someone (i.e., a journal editor) is interested in publishing this article, yes, I can still do that! :-)

Part I was about giving an introduction to the topic and an overview on multimedia content adaptation techniques. Part II was about the adaptation by transformation approach that utilizes scalable coding formats such as JPEG2000, MPEG-4 BSAC, and MPEG-4 SVC. Part III comprises adaptation decision-taking also known as the brain of multimedia content adaptation and this part is about standardization support for UMA.

Part IV - Standardization support for UMA

The nice thing about standards is that there are so many to choose from. Furthermore, if you do not like any of them, you can just wait for next year’s model.

--Andrew S. Tanenbaum

A couple of standardization organizations (SDOs) provide support for UMA:

Word Wide Web Consortium (W3C): http://www.w3.org/

Synchronized Multimedia Integration Language (SMIL), in particular the switch element of the content control modules.
Scalable Vector Graphics (SVG), i.e., conditional processing.
Cascading Style Sheet (CSS), the media queries.
Extensible MultiModal Annotation (EMMA)
Compound Document Format (CDF)
Device Independence (DI) and Content Adaptation, in particular

Internet Engineering Task Force (IETF): http://www.ietf.org/

Audio/Video Transport (AVT)
Media Server Control (MEDIACTRL)
Multiparty Multimedia Session Control (MMUSIC)
Session Description Protocol (SDP) and Session Initiation Protocol (SIP)
Next Steps in Signaling (NSIS)

Moving Picture Experts Group (MPEG): http://www.chiariglione.org/mpeg/

MPEG-7, Multimedia Content Description Interface
MPEG-21, The Multimedia Framework [1]

A comprehensive description format with respect to UMA is Part 7 of MPEG-21 entitled Digital Item Adaptation [2]. This part of MPEG-21 defines - among others - the Usage Environment Description (UED) providing means for describing the context in which Digital Items may be consumed. The UED is clustered into the following categories with some examples given:

User Characteristics: e.g., usage history, display presentation preferences, audio/visual impairments, mobility, etc.
Terminal Capabilities: e.g., coding capabilities, display capabilities, audio output capabilities, etc.
Network Characteristics: e.g., network capabilities (e.g., max capacity, min guaranteed) and conditions (e.g., available bandwidth)
Natural Environment Characteristics: e.g., noise level, illumination characteristics, location, time, etc.

The UED is defined as an XML Schema which is publicly available here.

This is the end of Part IV and I'm currently not sure whether a Part V will follow...

References:
[1] Ian Burnett, Fernando Pereira, Rik Van de Walle, and Rob Koenen (eds.), The MPEG-21 Book, John Wiley and Sons Ltd, 2006.
[2] Anthony Vetro and Christian Timmerer, Digital Item Adaptation: Overview of Standardization and Research Activities, IEEE Transactions on Multimedia, vol. 7, no. 3, pp. 418-426, June 2005.

Monday, September 21, 2009

MPEG Multimedia Standards Workshop

Purpose

Mutlimedia technologies have played an essential role in the deployment of many consumer and professional products and services around the world. Multimedia technologies have entered the common people’s life now. All indications show that, as in the past, they will continue to play an important role in the deployment of future products and services. This workshop will present some of the new multimedia technologies and their achievements and their applications, especially in audio, video standardizations and technologies as defined and studied by ISO/IEC JTC1 SC29 WG11, ITU-T SG16 Q.6 and AVS Working Group of Ministry of Information Industry of China.

Host & Date & Venue

Workshop on Multimedia New Technologies and Applications is hosted by Xidian University, Xi'an, China.
Workshop will be held on Saturday 31 Oct. 2009 (a full-day meeting).
Conference room is in Library of Xidian University.
Open to all MPEG and JPEG members.

Organization Committee

Chairman

Xin Wang
University of Southern California, USA

Members

Touradj Ebrahimi
EPFL, Switzerland
Wen Gao
Peking University, China
Tiejun Huang
Peking University, China
Joern Ostermann
Leibniz Universtaet Hannover, Germany
Guangming Shi
XianDian University, China
Huifang Sun
MERL Technology Lab, USA
Feng Wu
Microsoft, China

Program (Final)

Time	Topic	Speaker	Title
09:30	Welcome
09:40	Video	Lu Yu
10:10	3DV	Karsten Mueller	3D Video Formats and Coding Standards
10:40	MPEG-U	JAEYEON SONG	Widget Technologies in the Convergence Environment
11:10	Coffee break
11:20	RVC	Euee S. Jang	Introduction to MPEG Reconfigurable Video Coding
11:50	Audio	Minjie Xie	MPEG Unified Speech and Audio Coding (USAC)
12:20	BIF, LASeR	YoungKwon Lim	Rich Media standards in MPEG
12:50	Lunch

14:00	AVS	Wen Gao
14:30	MPEG-7	Miroslaw Bober
15:00	MAF	KyuHeon Kim
15:30	Coffee break
15:45	MPEG-21	Xin Wang	MPEG-21: Multimedia Framework Standards
16:15	MPEG-V	Sanghyun Joo	Introduction to MPEG-V activities
16:45	MXM	Christian Timmerer	MPEG-M: MPEG Extensible Middleware - Accelerating Media Business Developments
17:15	Q&A Session	Speakers, Audience
18:00	Adjourn

Wednesday, June 3, 2009

Peer-to-peer for content delivery for IPTV services: analysis of mechanisms and NGN impacts

Just found this document here which is a (first) draft of ETSI TISPAN's work in the area of P2P television. It seems that P2P is becoming a major issue for standardization:

IETF has mainly ALTO and P2PSIP working groups but also P2P streaming is pushing "on the market."
DVB recently released an Internet TV questionnaire which has some P2P aspects.
W3C had some P2P activity in the past but now it seems to be silent with respect to this topic. Maybe the situation changes in the future...
MPEG provides the baseline technologies (codecs and file formats) but also started an exploration activity towards an Advanced IPTV Terminal (AIT), probably jointly with ITU-T.

The future will show us which of these activites will be successful and I'd like to close with a quote that fits here very well, i.e., "The nice thing about standards is that you have so many to choose from". --Andrew S. Tanenbaum

Let me know in case something is missing or wrong.

Wednesday, April 30, 2008

MPEG news: a report from the 84th meeting in Archamps, France

The 84th MPEG meeting was held in Archamps, France (next to Geneva) and a few topics discussed at this meeting I'd like to briefly review here.

MPEG Representation of Sensory Effects

I've reported on that recently and this topic is maturing to a Call for Proposals (CfP) and the proposed technologies will be evaluated during the July meeting in Hannover, Germany. In this CfP MPEG is requesting technologies for the following items to be standardized:

Sensory Effect Metadata: description schemes and descriptors representing Sensory Effects.
Sensory Device Capabilities and Commands: description schemes and descriptors representing characteristics of Sensory Devices and means to control them.
User Sensory Preferences: description schemes and descriptors representing user preferences with respect to rendering of sensory effects.

Microsoft has adopted MPEG-21 Digital Item Declaration and Identification

Microsoft has adopted MPEG-21 technology within their Interactive Media Manager that is a collaborative media management solution that extends Microsoft Office SharePoint® Server 2007 for media and entertainment companies. Interestingly, their have adopted the Digital Item Model - an abstract model expressed in EBNF - and defined their own implementation of this model using RDF/OWL. Note that MPEG's implementation of the model is called Digital Item Declaration Language (DIDL) which is based on XML Schema.

Note: also UPnP has adopted MPEG-21 DIDL but in some kind of dialect called DIDL-lite.

Multimedia Extensible Middleware (MXM)

The Multimedia Extensible Middleware (MXM) aims to define APIs for various purposes. The requirements and also a Call for Proposals (requirements) have been issued at this meeting. It's very interesting to see that it should also accommodate for peer-to-peer technologies, i.e., storage/consumption of content in a distributed environment (P2P infrastructure based, e.g., distributed hash tables).

Other issues discussed at the meeting

Presentation of Structured Information: For this item a Call for Proposals (requirements) has been issued. One use cases and overall model is to include a presentation description into Digital Items that upon receipt of a Digital Item automatically extracts this information asset which is used to present the Digital Item.
MPEG User Interface Framework: personalize, adaptable and exchangeable rich user interfaces based on the use of a presentation format (BIFS, LaSER, ...), a language for describing the personalization context (UED, CC/PP, DCO, ...), and home network protocols (DLNA, UPnP, ...).
MPEG-V: This new project item defines interfaces between virtual worlds and between virtual worlds and the real world. An extended call for requirements has been issued.
Multimedia Value Chain Ontology (MVCO): aims to define an ontology for the whole multimedia value chain and a Call for Proposals (requirements) has been issued. However, the main focus is on rights-related issues at this moment.

Pages