Showing posts with label 3d audio. Show all posts
Showing posts with label 3d audio. Show all posts

Thursday, July 16, 2020

MPEG131 Press Release: MPEG-H 3D Audio – WG11 (MPEG) promotes Baseline Profile for 3D Audio to final stage

MPEG131 Press Release: Index

MPEG-H 3D Audio – WG11 (MPEG) promotes Baseline Profile for 3D Audio to the final stage

At its 131st meeting, WG11 (MPEG) announces the completion of the new ISO/IEC 23008-3:2019, Amendment 2, "3D Audio Baseline profile, Corrections and Improvements," which has been promoted to Final Draft Amendment (FDAM) status. This amendment introduces a new profile called Baseline profile addressing industry demands. Tailored for broadcast, streaming, and high-quality immersive music delivery use cases, the 3D Audio Baseline profile supports channel and object signals and is a subset of the existing Low Complexity profile. The 3D Audio Baseline profile can be signaled in a backwards compatible fashion, enabling interoperability with existing devices implementing the 3D Audio Low Complexity profile. In addition to its advanced loudness and Dynamic Range Control (DRC), interactivity and accessibility features, the Baseline profile enables the usage of up to 24 audio objects in Level 3 for high quality immersive music delivery.

At the same time, MPEG initiates New Editions at Committee Draft (CD) status for MPEG-H 3D Audio Reference Software and Conformance which incorporate the 3D Audio Baseline profile functionality.
In addition to finalizing the Amendment, WG11 made available the “MPEG-H 3D Audio Baseline Profile Verification Test Report”. This reports on the results of five subjective listening tests assessing the performance of the 3D Audio Baseline profile. Covering a wide range of bit rates and immersive audio use cases, the tests were conducted in nine different test sites with a total of 341 listeners. 

Analysis of the test data resulted in the following conclusions:
  • Test 1 measured performance for the “Ultra-HD Broadcast” use case, in which highly immersive audio material was coded at 768 kb/s and presented using 22.2 or 7.1+4H channel loudspeaker layouts. The test showed that at the bit rate of 768 kb/s, the 3D Audio Baseline Profile easily achieves “ITU-R High-Quality Emission” quality, as needed in broadcast applications.
  • Test 2 measured performance for the “HD Broadcast” or “A/V Streaming” use case, in which immersive audio material was coded at three bit rates: 512 kb/s, 384 kb/s and 256 kb/s and presented using 7.1+4H or 5.1+2H channel loudspeaker layouts. The test showed that for all bit rates, the 3D Audio Baseline Profile achieved a quality of “Excellent” on the MUSHRA subjective quality scale.
  • Test 3 measured performance for the “High Efficiency Broadcast” use case, in which audio material was coded at three bit rates, with specific bit rates depending on the number of channels in the material. Bitrates ranged from 256 kb/s (5.1+2H) to 48 kb/s (stereo). The test showed that for all bit rates, the 3D Audio Baseline Profile achieved a quality of “Excellent” on the MUSHRA subjective quality scale.
  • Test 4 measured performance for the “Mobile” use case, in which immersive audio material was coded at 384 kb/s, and presented via headphones. The 3D Audio FD binaural renderer was used to render a virtual, immersive audio sound stage for the headphone presentation. The test showed that at 384 kb/s, the 3D Audio Baseline Profile with binaural rendering achieved a quality of “Excellent” on the MUSHRA subjective quality scale.
  • Test 5 measured performance for the "High Quality Immersive Music Delivery" use case in which object based immersive music is delivered to the receiver with up to 24 objects at high per object bit rates. This test used 11.1 (as 7.1+4H) as presentation format, with material coded at a rate of 1536 kb/s. The test showed that at that bit rate, the 3D Audio Baseline Profile easily achieves "ITU-R High-Quality Emission" quality, as needed in high quality music delivery applications.

Friday, February 10, 2017

MPEG news: a report from the 117th meeting, Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects. Additionally, this version of the blog post will be also posted at ACM SIGMM Records.
MPEG News Archive
The 117th MPEG meeting was held in Geneva, Switzerland and its press release highlights the following aspects:
  • MPEG issues Committee Draft of the Omnidirectional Media Application Format (OMAF)
  • MPEG-H 3D Audio Verification Test Report
  • MPEG Workshop on 5-Year Roadmap Successfully Held in Geneva
  • Call for Proposals (CfP) for Point Cloud Compression (PCC)
  • Preliminary Call for Evidence on video compression with capability beyond HEVC
  • MPEG issues Committee Draft of the Media Orchestration (MORE) Standard
  • Technical Report on HDR/WCG Video Coding
In this blog post, I'd like to focus on the topics related to multimedia communication. Thus, let's start with OMAF.

Omnidirectional Media Application Format (OMAF)

Real-time entertainment services deployed over the open, unmanaged Internet – streaming audio and video – account now for more than 70% of the evening traffic in North American fixed access networks and it is assumed that this figure will reach 80 percent by 2020. More and more such bandwidth hungry applications and services are pushing onto the market including immersive media services such as virtual reality and, specifically 360-degree videos. However, the lack of appropriate standards and, consequently, reduced interoperability is becoming an issue. Thus, MPEG has started a project referred to as Omnidirectional Media Application Format (OMAF). The first milestone of this standard has been reached and the committee draft (CD) has been approved at the 117th MPEG meeting. Such application formats "are essentially superformats that combine selected technology components from MPEG (and other) standards to provide greater application interoperability, which helps satisfy users' growing need for better-integrated multimedia solutions" [MPEG-A]." In the context of OMAF, the following aspects are defined:
  • Equirectangular projection format (note: others might be added in the future)
  • Metadata for interoperable rendering of 360-degree monoscopic and stereoscopic audio-visual data
  • Storage format: ISO base media file format (ISOBMFF)
  • Codecs: High Efficiency Video Coding (HEVC) and MPEG-H 3D audio
OMAF is the first specification which is defined as part of a bigger project currently referred to as ISO/IEC 23090 -- Immersive Media (Coded Representation of Immersive Media). It currently has the acronym MPEG-I and we have previously used MPEG-VR which is now replaced by MPEG-I (that still might chance in the future). It is expected that the standard will become Final Draft International Standard (FDIS) by Q4 of 2017. Interestingly, it does not include AVC and AAC, probably the most obvious candidates for video and audio codecs which have been massively deployed in the last decade and probably still will be a major dominator (and also denominator) in upcoming years. On the other hand, the equirectangular projection format is currently the only one defined as it is broadly used already in off-the-shelf hardware/software solutions for the creation of omnidirectional/360-degree videos. Finally, the metadata formats enabling the rendering of 360-degree monoscopic and stereoscopic video is highly appreciated. A solution for MPEG-DASH based on AVC/AAC utilizing equirectangular projection format for both monoscopic and stereoscopic video is shown as part of Bitmovin's solution for VR and 360-degree video.

Research aspects related to OMAF can be summarized as follows:
  • HEVC supports tiles which allow for efficient streaming of omnidirectional video but HEVC is not as widely deployed as AVC. Thus, it would be interesting how to mimic such a tile-based streaming approach utilizing AVC.
  • The question how to efficiently encode and package HEVC tile-based video is an open issue and call for a tradeoff between tile flexibility and coding efficiency.
  • When combined with MPEG-DASH (or similar), there's a need to update the adaptation logic as the with tiles yet another dimension is added that needs to be considered in order to provide a good Quality of Experience (QoE).
  • QoE is a big issue here and not well covered in the literature. Various aspects are worth to be investigated including a comprehensive dataset to enable reproducibility of research results in this domain. Finally, as omnidirectional video allows for interactivity, also the user experience is becoming an issue which needs to be covered within the research community.
A second topic I'd like to highlight in this blog post is related to the preliminary call for evidence on video compression with capability beyond HEVC.

Preliminary Call for Evidence on video compression with capability beyond HEVC

A call for evidence is issued to see whether sufficient technological potential exists to start a more rigid phase of standardization. Currently, MPEG together with VCEG have developed a Joint Exploration Model (JEM) algorithm that is already known to provide bit rate reductions in the range of 20-30% for relevant test cases, as well as subjective quality benefits. The goal of this new standard -- with a preliminary target date for completion around late 2020 -- is to develop technology providing better compression capability than the existing standard, not only for conventional video material but also for other domains such as HDR/WCG or VR/360-degrees video. An important aspect in this area is certainly over-the-top video delivery (like with MPEG-DASH) which includes features such as scalability and Quality of Experience (QoE). Scalable video coding has been added to video coding standards since MPEG-2 but never reached wide-spread adoption. That might change in case it becomes a prime-time feature of a new video codec as scalable video coding clearly shows benefits when doing dynamic adaptive streaming over HTTP. QoE did find its way already into video coding, at least when it comes to evaluating the results where subjective tests are now an integral part of every new video codec developed by MPEG (in addition to usual PSNR measurements). Therefore, the most interesting research topics from a multimedia communication point of view would be to optimize the DASH-like delivery of such new codecs with respect to scalability and QoE. Note that if you don't like scalable video coding, feel free to propose something else as long as it reduces storage and networking costs significantly.

MPEG Workshop “Global Media Technology Standards for an Immersive Age”

On January 18, 2017 MPEG successfully held a public workshop on “Global Media Technology Standards for an Immersive Age” hosting a series of keynotes from Bitmovin, DVB, Orange, Sky Italia, and Technicolor. Stefan Lederer, CEO of Bitmovin discussed today's and future challenges with new forms of content like 360°, AR and VR. All slides are available here and MPEG took their feedback into consideration in an update of its 5-year standardization roadmap. David Wood (EBU) reported on the DVB VR study mission and Ralf Schaefer (Technicolor) presented a snapshot on VR services. Gilles Teniou (Orange) discussed video formats for VR pointing out a new opportunity to increase the content value but also raising a question what is missing today. Finally, Massimo Bertolotti (Sky Italia) introduced his view on the immersive media experience age.

Overall, the workshop was well attended and as mentioned above, MPEG is currently working on a new standards project related to immersive media. Currently, this project comprises five parts. The first part comprises a technical report describing the scope (incl. kind of system architecture), use cases, and applications. The second part is OMAF (see above) and the third/forth parts are related to immersive video and audio respectively. Part five is about point cloud compression.

For those interested, please check out the slides from industry representatives in this field and draw your own conclusions what could be interesting for your own research. I'm happy to see any reactions, hints, etc. in the comments..

Finally, let's have a look what happened related to MPEG-DASH, a topic with a long history on this blog.

MPEG-DASH and CMAF: Friend or Foe?

For MPEG-DASH and CMAF it was a meeting "in between" official standardization stages. MPEG-DASH experts are still working on the third edition which will be a consolidated version of the 2nd edition and various amendments and corrigenda. In the meantime, MPEG issues a white paper on the new features of MPEG-DASH which I would like to highlight here.
  • Spatial Relationship Description (SRD): allows to describe tiles and region of interests for partial delivery of media presentations. This is highly related to OMAF and VR/360-degree video streaming.
  • External MPD linking: this feature allows to describe the relationship between a single program/channel and a preview mosaic channel having all channels at once within the MPD.
  • Period continuity: simple signaling mechanism to indicate whether one period is a continuation of the previous one which is relevant for ad-insertion or live programs.
  • MPD chaining: allows for chaining two or more MPDs to each other, e.g., pre-roll ad when joining a live program.
  • Flexible segment format for broadcast TV: separates the signaling of the switching points and random access points in each stream and, thus, the content can be encoded with a good compression efficiency, yet allowing higher number of random access point, but with lower frequency of switching points.
  • Server and network-assisted DASH (SAND): enables asynchronous network-to-client and network-to-network communication of quality-related assisting information.
  • DASH with server push and WebSockets: basically addresses issues related to HTTP/2 push feature and WebSocket.
CMAF issued a study document which captures the current progress and all national bodies are encouraged to take this into account when commenting on the Committee Draft (CD). To answer the question in the headline above, it looks more and more like as DASH and CMAF will become friends -- let's hope that the friendship lasts for a long time.

What else happened at the MPEG meeting?

  • Committee Draft MORE (note: type in 'man more' on any unix/linux/max terminal and you'll get 'less - opposite of more';): MORE stands for “Media Orchestration” and provides a specification that enables the automated combination of multiple media sources (cameras, microphones) into a coherent multimedia experience. Additionally, it targets use cases where a multimedia experience is rendered on multiple devices simultaneously, again giving a consistent and coherent experience.
  • Technical Report on HDR/WCG Video Coding: This technical report comprises conversion and coding practices for High Dynamic Range (HDR) and Wide Colour Gamut (WCG) video coding (ISO/IEC 23008-14). The purpose of this document is to provide a set of publicly referenceable recommended guidelines for the operation of AVC or HEVC systems adapted for compressing HDR/WCG video for consumer distribution applications
  • CfP Point Cloud Compression (PCC): This call solicits technologies for the coding of 3D point clouds with associated attributes such as color and material properties. It will be part of the immersive media project introduced above.
  • MPEG-H 3D Audio verification test report: This report presents results of four subjective listening tests that assessed the performance of the Low Complexity Profile of MPEG-H 3D Audio. The tests covered a range of bit rates and a range of “immersive audio” use cases (i.e., from 22.2 down to 2.0 channel presentations). Seven test sites participated in the tests with a total of 288 listeners.
The next MPEG meeting will be held in Hobart, April 3-7, 2017. Feel free to contact us for any questions or comments.

Tuesday, July 28, 2015

MPEG news: a report from the 112th meeting, Warsaw, Poland



This blog post is also available at at bitmovin tech blog and SIGMM records.

The 112th MPEG meeting in Warsaw, Poland was a special meeting for me. It was my 50th MPEG meeting which roughly accumulates to one year of MPEG meetings (i.e., one year of my life I've spend in MPEG meetings incl. traveling - scary, isn't it? ... more on this in another blog post). But what happened at this 112th MPEG meeting (my 50th meeting)...

  • Requirements: CDVA, Future of Video Coding Standardization (no acronym yet), Genome compression
  • Systems: M2TS (ISO/IEC 13818-1:2015), DASH 3rd edition, Media Orchestration (no acronym yet), TRUFFLE
  • Video/JCT-VC/JCT-3D: MPEG-4 AVC, Future Video Coding, HDR, SCC
  • Audio: 3D audio
  • 3DG: PCC, MIoT, Wearable
MPEG Friday Plenary. Photo (c) Christian Timmerer.
As usual, the official press release and other publicly available documents can be found here. Let's dig into the different subgroups:
Requirements

In requirements experts were working on the Call for Proposals (CfP) for Compact Descriptors for Video Analysis (CDVA) including an evaluation framework. The evaluation framework includes 800-1000 objects (large objects like building facades, landmarks, etc.; small(er) objects like paintings, books, statues, etc.; scenes like interior scenes, natural scenes, multi-camera shots) and the evaluation of the responses should be conducted for the 114th meeting in San Diego.

The future of video coding standardization is currently happening in MPEG and shaping the way for the successor of of the HEVC standard. The current goal is providing (native) support for scalability (more than two spatial resolutions) and 30% compression gain for some applications (requiring a limited increase in decoder complexity) but actually preferred is 50% compression gain (at a significant increase of the encoder complexity). MPEG will hold a workshop at the next meeting in Geneva discussing specific compression techniques, objective (HDR) video quality metrics, and compression technologies for specific applications (e.g., multiple-stream representations, energy-saving encoders/decoders, games, drones). The current goal is having the International Standard for this new video coding standard around 2020.

MPEG has recently started a new project referred to as Genome Compression which is about of course about the compression of genome information. A big dataset has been collected and experts working on the Call for Evidence (CfE). The plan is holding a workshop at the next MPEG meeting in Geneva regarding prospect of Genome Compression and Storage Standardization targeting users, manufactures, service providers, technologists, etc.

Systems


Summer in Warsaw. Photo (c) Christian Timmerer.
The 5th edition of the MPEG-2 Systems standard has been published as ISO/IEC 13818-1:2015 on the 1st of July 2015 and is a consolidation of the 4th edition + Amendments 1-5.

In terms of MPEG-DASH, the draft text of ISO/IEC 23009-1 3rd edition comprising 2nd edition + COR 1 + AMD 1 + AMD 2 + AMD 3 + COR 2 is available for committee internal review. The expected publication date is scheduled for, most likely, 2016. Currently, MPEG-DASH includes a lot of activity in the following areas: spatial relationship description, generalized URL parameters, authentication, access control, multiple MPDs, full duplex protocols (aka HTTP/2 etc.), advanced and generalized HTTP feedback information, and various core experiments:
  • SAND (Sever and Network Assisted DASH)
  • FDH (Full Duplex DASH)
  • SAP-Independent Segment Signaling (SISSI)
  • URI Signing for DASH
  • Content Aggregation and Playback COntrol (CAPCO)
In particular, the core experiment process is very open as most work is conducted during the Ad hoc Group (AhG) period which is discussed on the publicly available MPEG-DASH reflector.

MPEG systems recently started an activity that is related to media orchestration which applies to capture as well as consumption and concerns scenarios with multiple sensors as well as multiple rendering devices, including one-to-many and many-to-one scenarios resulting in a worthwhile, customized experience.

Finally, the systems subgroup started an exploration activity regarding real-time streaming of file (a.k.a TRUFFLE) which should perform an gap analysis leading to extensions of the MPEG Media Transport (MMT) standard. However, some experts within MPEG concluded that most/all use cases identified within this activity could be actually solved with existing technology such as DASH. Thus, this activity may still need some discussions...

Video/JCT-VC/JCT-3D

The MPEG video subgroup is working towards a new amendment for the MPEG-4 AVC standard covering resolutions up to 8K and higher frame rates for lower resolution. Interestingly, although MPEG most of the time is ahead of industry, 8K and high frame rate is already supported in browser environments (e.g., using bitdash 8K, HFR) and modern encoding platforms like bitcodin. However, it's good that we finally have means for an interoperable signaling of this profile.

In terms of future video coding standardization, the video subgroup released a call for test material. Two sets of test sequences are already available and will be investigated regarding compression until next meeting.

After a successful call for evidence for High Dynamic Range (HDR), the technical work starts in the video subgroup with the goal to develop an architecture ("H2M") as well as three core experiments (optimization without HEVC specification change, alternative reconstruction approaches, objective metrics).

The main topic of the JCT-VC was screen content coding (SCC) which came up with new coding tools that are better compressing content that is (fully or partially) computer generated leading to a significant improvement of compression, approx. or larger than 50% rate reduction for specific screen content.

Audio

The audio subgroup is mainly concentrating on 3D audio where they identified the need for intermediate bitrates between 3D audio phase 1 and 2. Currently, phase 1 identified 256, 512, 1200 kb/s whereas phase 2 focuses on 128, 96, 64, 48 kb/s. The broadcasting industry needs intermediate bitrates and, thus, phase 2 is extended to bitrates between 128 and 256 kb/s.

3DG

MPEG 3DG is working on point cloud compression (PCC) for which open source software has been identified. Additionally, there're new activity in the area of Media Internet of Things (MIoT) and wearable computing (like glasses and watches) that could lead to new standards developed within MPEG. Therefore, stay tuned on these topics as they may shape your future.

The week after the MPEG meeting I met the MPEG convenor and the JPEG convenor again during ICME2015 in Torino but that's another story...
L. Chiariglione, H. Hellwagner, T. Ebrahimi, C. Timmerer (from left to right) during ICME2015. Photo (c) T. Ebrahimi.



Wednesday, March 18, 2015

MPEG news: a report from the 111th meeting, Geneva, Switzerland

MPEG111 opening plenary.
This blog post is also available at SIGMM records.

The 111th MPEG meeting (note: link includes press release and all publicly available output documents) was held in Geneva, Switzerland showing up some interesting aspects which I’d like to highlight here. Undoubtedly, it was the shortest meeting I’ve ever attended (and my first meeting was #61) as final plenary concluded at 2015/02/20T18:18!

In terms of the requirements (subgroup) it’s worth to mention the call for evidence (CfE) for high-dynamic range (HDR) and wide color gamut (WCG) video coding which comprises a first milestone towards a new video coding format. The purpose of this CfE is to explore whether or not (a) the coding efficiency and/or (b) the functionality of the HEVC Main 10 and Scalable Main 10 profiles can be significantly improved for HDR and WCG content. In addition to that requirements issues a draft call for evidence on free viewpoint TV. Both documents are publicly available here.

The video subgroup continued discussions related to the future of video coding standardisation and issued a public document requesting contributions on “future video compression technology”. Interesting application requirements come from over-the-top streaming use cases which request HDR and WCG as well as video over cellular networks. Well, at least the former is something to be covered by the CfE mentioned above. Furthermore, features like scalability and perceptual quality is something that should be considered from ground-up and not (only) as an extension. Yes, scalability is something that really helps a lot in OTT streaming starting from easier content management, cache-efficient delivery, and it allows for a more aggressive buffer modelling and, thus, adaptation logic within the client enabling better Quality of Experience (QoE) for the end user. It seems like complexity (at the encoder) is not such much a concern as long as it scales with cloud deployments such as http://www.bitcodin.com/ (e.g., the bitdash demo area shows some neat 4K/8K/HFR DASH demos which have been encoded with bitcodin). Closely related to 8K, there’s a new AVC amendment coming up covering 8K although one can do it already today (see before) but it’s good to have standards support for this. For HEVC, the JCT-3D/VC issued the FDAM4 for 3D Video Extensions and started with PDAM5 for Screen Content Coding Extensions (both documents being publicly available after an editing period of about a month).

And what about audio, the audio subgroup has decided that ISO/IEC DIS 23008-3 3D Audio shall be promoted directly to IS which means that the DIS was already at such a good state that only editorial comments are applied which actually saves a balloting cycle. We have to congratulate the audio subgroup for this remarkable milestone.

Finally, I’d like to discuss a few topics related to DASH which is progressing towards its 3rd edition which will incorporate amendment 2 (Spatial Relationship Description, Generalized URL parameters and other extensions), amendment 3 (Authentication, Access Control and multiple MPDs), and everything else that will be incorporated within this year, like some aspects documented in the technologies under consideration or currently being discussed within the core experiments (CE).
Currently, MPEG-DASH conducts 5 core experiments:
  • Server and Network Assisted DASH (SAND)
  • DASH over Full Duplex HTTP-based Protocols (FDH)
  • URI Signing for DASH (CE-USD)
  • SAP-Independent Segment SIgnaling (SISSI)
  • Content aggregation and playback control (CAPCO)
The description of core experiments is publicly available and, compared to the previous meeting, we have a new CE which is about content aggregation and playback control (CAPCO) which "explores solutions for aggregation of DASH content from multiple live and on-demand origin servers, addressing applications such as creating customized on-demand and live programs/channels from multiple origin servers per client, targeted preroll ad insertion in live programs and also limiting playback by client such as no-skip or no fast forward.” This process is quite open and anybody can join by subscribing to the email reflector.

The CE for DASH over Full Duplex HTTP-based Protocols (FDH) is becoming major and basically defines the usage of DASH for push-features of WebSockets and HTTP/2. At this meeting MPEG issues a working draft and also the CE on Server and Network Assisted DASH (SAND) got its own part 5 where it goes to CD but documents are not publicly available. However, I'm pretty sure I can report more on this next time, so stay tuned or feel free to comment here.

Friday, April 25, 2014

MPEG news: a report from the 108th meeting, Valencia, Spain

This blog post is also available at bitmovin tech blog and SIGMM records.

The 108th MPEG meeting was held at the Palacio de Congresos de Valencia in Spain featuring the following highlights (no worries about the acronyms, this is on purpose and they will be further explained below):
  • Requirements: PSAF, SCC, CDVA
  • Systems: M2TS, MPAF, Green Metadata
  • Video: CDVS, WVC, VCB
  • JCT-VC: SHVC, SCC
  • JCT-3D: MV/3D-HEVC, 3D-AVC
  • Audio: 3D audio 
Opening Plenary of the 108th MPEG meeting in Valencia, Spain.
The official MPEG press release can be downloaded from the MPEG Web site. Some of the above highlighted topics will be detailed in the following and, of course, there’s an update on DASH-related matters at the end.

As indicated above, MPEG is full of (new) acronyms and in order to become familiar with those, I’ve put them deliberately in the overview but I will explain them further below.

PSAF – Publish/Subscribe Application Format

Publish/subscribe corresponds to a new network paradigm related to content-centric networking (or information-centric networking) where the content is addressed by its name rather than location. An application format within MPEG typically defines a combination of existing MPEG tools jointly addressing the needs for a given application domain, in this case, the publish/subscribe paradigm. The current requirements and a preliminary working draft are publicly available.

SCC – Screen Content Coding

I’ve introduced this topic in my previous report and this meeting the responses to the CfP have been evaluated. In total, seven responses have been received which meet all requirements and, thus, the actual standardization work is transferred to JCT-VC. Interestingly, the results of the CfP are publicly available. Within JCT-VC, a first test model has been defined and core experiments have been established. I will report more on this as an output of the next meetings…

CDVA – Compact Descriptors for Video Analysis

This project has been renamed from compact descriptors for video search to compact descriptors for video analysis and comprises a publicly available vision statement. That is, interested parties are welcome to join this new activity within MPEG.

M2TS – MPEG-2 Transport Stream

At this meeting, various extensions to M2TS have been defined such as transport of multi-view video coding depth information and extensions to HEVC, delivery of timeline for external data as well as carriage of layered HEVC, green metadata, and 3D audio. Hence, M2TS is still very active and multiple amendments are developed in parallel.

MPAF – Multimedia Preservation Application Format

The committee draft for MPAF has been approved and, in this context, MPEG-7 is extended with additional description schemes.

Green Metadata

Well, this standard does not have its own acronym; it’s simply referred to as MPEG-GREEN. The draft international standard has been approved and national bodies will vote on it at the JTC 1 level. It basically defines metadata to allow clients operating in an energy-efficient way. It comes along with amendments to M2TS and ISOBMFF that enable the carriage and storage of this metadata.

CDVS – Compact Descriptors for Visual Search

CDVS is at DIS stage and provide improvements on global descriptors as well as non-normative improvements of key-point detection and matching in terms of speedup and memory consumption. As all standards at DIS stage, national bodies will vote on it at the JTC 1 level. 

What’s new in the video/audio-coding domain?
  • WVC – Web Video Coding: This project reached final draft international standard with the goal to provide a video-coding standard for Web applications. It basically defines a profile of the MPEG-AVC standard including those tools not encumbered by patents.
  • VCB – Video Coding for Browsers: The committee draft for part 31 of MPEG-4 defines video coding for browsers and basically defines VP8 as an international standard. This is explains also the difference to WVC.
  • SHVC – Scalable HEVC extensions: As for SVC, SHVC will be defined as an amendment to HEVC providing the same functionality as SVC, scalable video coding functionality.
  • MV/3D-HEVC, 3D-AVC: These are multi-view and 3D extensions for the HEVC and AVC standards respectively.
  • 3D Audio: Also, no acronym for this standard although I would prefer 3DA. However, CD has been approved at this meeting and the plan is to have DIS at the next meeting. At the same time, the carriage and storage of 3DA is being defined in M2TS and ISOBMFF respectively. 
Finally, what’s new in the media transport area, specifically DASH and MMT?

As interested readers know from my previous reports, DASH 2nd edition has been approved has been approved some time ago. In the meantime, a first amendment to the 2nd edition is at draft amendment state including additional profiles (mainly adding xlink support) and time synchronization. A second amendment goes to the first ballot stage referred to as proposed draft amendment and defines spatial relationship description, generalized URL parameters, and other extensions. Eventually, these two amendments will be integrated in the 2nd edition which will become the MPEG-DASH 3rd edition. Also a corrigenda on the 2nd edition is currently under ballot and new contributions are still coming in, i.e., there is still a lot of interest in DASH. For your information – there will be two DASH-related sessions at Streaming Forum 2014.

On the other hand, MMT’s amendment 1 is currently under ballot and amendment 2 defines header compression and cross-layer interface. The latter has been progressed to a study document which will be further discussed at the next meeting. Interestingly, there will be a MMT developer’s day at the 109th MPEG meeting as in Japan, 4K/8K UHDTV services will be launched based on MMT specifications and in Korea and China, implementation of MMT is now under way. The developer’s day will be on July 5th (Saturday), 2014, 10:00 – 17:00 at the Sapporo Convention Center. Therefore, if you don’t know anything about MMT, the developer’s day is certainly a place to be.

Contact:

Dr. Christian Timmerer
CIO bitmovin GmbH | christian.timmerer@bitmovin.net
Alpen-Adria-Universität Klagenfurt | christian.timmerer@aau.at

What else? That is, some publicly available MPEG output documents… (Dates indicate availability and end of editing period, if applicable, using the following format YY/MM/DD):
  • Text of ISO/IEC 13818-1:2013 PDAM 7 Carriage of Layered HEVC (14/05/02) 
  • WD of ISO/IEC 13818-1:2013 AMD Carriage of Green Metadata (14/04/04) 
  • WD of ISO/IEC 13818-1:2013 AMD Carriage of 3D Audio (14/04/04) 
  • WD of ISO/IEC 13818-1:2013 AMD Carriage of additional audio profiles & levels (14/04/04) 
  • Text of ISO/IEC 14496-12:2012 PDAM 4 Enhanced audio support (14/04/04) 
  • TuC on sample variants, signatures and other improvements for the ISOBMFF (14/04/04) 
  • Text of ISO/IEC CD 14496-22 3rd edition (14/04/04) 
  • Text of ISO/IEC CD 14496-31 Video Coding for Browsers (14/04/11) 
  • Text of ISO/IEC 15938-5:2005 PDAM 5 Multiple text encodings, extended classification metadata (14/04/04) 
  • WD 2 of ISO/IEC 15938-6:201X (2nd edition) (14/05/09) 
  • Text of ISO/IEC DIS 15938-13 Compact Descriptors for Visual Search (14/04/18) 
  • Test Model 10: Compact Descriptors for Visual Search (14/05/02) 
  • WD of ARAF 2nd Edition (14/04/18) 
  • Use cases for ARAF 2nd Edition (14/04/18) 
  • WD 5.0 MAR Reference Model (14/04/18) 
  • Logistic information for the 5th JAhG MAR meeting (14/04/04) 
  • Text of ISO/IEC CD 23000-15 Multimedia Preservation Application Format (14/04/18) 
  • WD of Implementation Guideline of MP-AF (14/04/04) 
  • Requirements for Publish/Subscribe Application Format (PSAF) (14/04/04) 
  • Preliminary WD of Publish/Subscribe Application Format (14/04/04) 
  • WD2 of ISO/IEC 23001-4:201X/Amd.1 Parser Instantiation from BSD (14/04/11) 
  • Text of ISO/IEC 23001-8:2013/DCOR1 (14/04/18) 
  • Text of ISO/IEC DIS 23001-11 Green Metadata (14/04/25) 
  • Study Text of ISO/IEC 23002-4:201x/DAM2 FU and FN descriptions for HEVC (14/04/04) 
  • Text of ISO/IEC 23003-4 CD, Dynamic Range Control (14/04/11) 
  • MMT Developers’ Day in 109th MPEG meeting (14/04/04) 
  • Results of CfP on Screen Content Coding Tools for HEVC (14/04/30) 
  • Study Text of ISO/IEC 23008-2:2013/DAM3 HEVC Scalable Extensions (14/06/06) 
  • HEVC RExt Test Model 7 (14/06/06) 
  • Scalable HEVC (SHVC) Test Model 6 (SHM 6) (14/06/06) 
  • Report on HEVC compression performance verification testing (14/04/25) 
  • HEVC Screen Content Coding Test Model 1 (SCM 1) (14/04/25) 
  • Study Text of ISO/IEC 23008-2:2013/PDAM4 3D Video Extensions (14/05/15) 
  • Test Model 8 of 3D-HEVC and MV-HEVC (14/05/15) 
  • Text of ISO/IEC 23008-3/CD, 3D audio (14/04/11) 
  • Listening Test Logistics for 3D Audio Phase 2 (14/04/04) 
  • Active Downmix Control (14/04/04) 
  • Text of ISO/IEC PDTR 23008-13 Implementation Guidelines for MPEG Media Transport (14/05/02) 
  • Text of ISO/IEC 23009-1 2nd edition DAM 1 Extended Profiles and availability time synchronization (14/04/18) 
  • Text of ISO/IEC 23009-1 2nd edition PDAM 2 Spatial Relationship Description, Generalized URL parameters and other extensions (14/04/18) 
  • Text of ISO/IEC PDTR 23009-3 2nd edition DASH Implementation Guidelines (14/04/18) 
  • MPEG vision for Compact Descriptors for Video Analysis (CDVA) (14/04/04) 
  • Plan of FTV Seminar at 109th MPEG Meeting (14/04/04) 
  • Draft Requirements and Explorations for HDR /WCG Content Distribution and Storage (14/04/04) 
  • Working Draft 2 of Internet Video Coding (IVC) (14/04/18) 
  • Internet Video Coding Test Model (ITM) v 9.0 (14/04/18) 
  • Uniform Timeline Alignment (14/04/18) 
  • Plan of Seminar on Hybrid Delivery at the 110th MPEG Meeting (14/04/04) 
  • WD 2 of MPEG User Description (14/04/04)

Saturday, August 3, 2013

MPEG news: a report from the 105th meeting, Vienna, Austria

At the 105th MPEG meeting in Vienna, Austria, a lot of interesting things happened. First, this was not only the 105th MPEG meeting but also the 48th VCEG meeting, 14th JCT-VC meeting, 5th JCT-3V meeting, and 26th SC29 meeting bringing together more than 400 experts from more than 20 countries to discuss technical issues in the domain of coding of audio, [picture (SC29 only),] multimedia and hypermedia information. Second, it was the 3rd meeting hosted in Austria after the 62nd in July 2002 and 77th in July 2006. In 2002, “the new video coding standard being developed jointly with the ITU-T VCEG organization was promoted to Final Committee Draft (FCD)” and in 2006 "MPEG Surround completed its technical work and has been submitted for final FDIS balloting” as well as "MPEG has issued a Final Call for Proposals on MPEG-7 Query Format (MP7QF)”.

The official press release of the 105th meeting can be found here but I’d like to highlight a couple of interesting topics including research aspects covered or enabled by them. Although research efforts may lead to the standardization activities but also enables research as you may see below.

MPEG selects technology for the upcoming MPEG-H 3D audio standard
Based on the responses submitted to the Call for Proposals (CfP) on MPEG-H 3D audio, MPEG selected technology supporting content based on multiple formats, i.e., channels and objects (CO) and higher order ambisonics (HOA). All submissions have been evaluated by comprehensive and standardized subjective listening tests followed by statistical analysis of the results. Interestingly, when taking the highest bitrate of 1.2 Mb/s with a 22.2 channel configuration, both of the selected technologies have achieved excellent quality and are very close to true transparency. That is, listeners cannot differentiate between the encoded and uncompressed bitstream. A first version of the MPEG-H 3D audio standard with higher bitrates of around 1.2 Mb/s to 256 kb/s should be available by March 2014 (Committee Draft - CD), July 2014 (Draft International Standard - DIS), and January 2015 (Final Draft International Standards - FDIS), respectively.

Research topics: Although the technologies have been selected, it's still a long way until the standard gets ratified by MPEG and published by ISO/IEC. Thus, there's a lot of space for researching efficient encoding tools including the subjective quality evaluations thereof. Additionally, it may impact the way 3D Audio bitstreams are transferred from one entity to the another including file-based, streaming, on demand, and live services. Finally, within the application domain it may enable new use cases which are interesting to explore from a research point of view.

Augmented Reality Application Format reaches FDIS status

The MPEG Augmented Reality Application Format (ARAF, ISO/IEC 23000-13) enables the augmentation of the real world with synthetic media objects by combining multiple, existing standards within a single specific application format addressing certain industry needs. In particular, it combines standards providing representation formats for scene description (i.e., subset of BIFS), sensor/actuator descriptors (MPEG-V), and media formats such as audio/video coding formats. There are multiple target applications which may benefit from the MPEG ARAF standard, e.g., geolocation-based services, image-based object detection and tracking, mixed and augmented reality games and real-virtual interactive scenarios.

Research topics: Please note that MPEG ARAF only specifies the format to enable interoperability in order to support use cases enabled by this format. Hence, there are many research topics which could be associated to the application domains identified above.

What's new in Dynamic Adaptive Streaming over HTTP?

The DASH outcome of the 105th MPEG meeting comes with a couple of highlights. First, a public workshop was held on session management and control (#DASHsmc) which will be used to derive additional requirements for DASH. All position papers and presentations are publicly available here. Second, the first amendment (Amd.1) to part 1 of MPEG-DASH (ISO/IEC 23009-1:2012) has reached the final stage of standardization and together with the first corrigendum (Cor.1) and the existing part 1, the FDIS of the second edition of ISO/IEC 23009-1:201x has been approved. This includes support for event messages (e.g., to be used for live streaming and dynamic ad insertion) and a media presentation anchor which enables session mobility among others. Third and finally, the FDIS of conformance and reference software (ISO/IEC 23009-2) has been approved providing means for media presentation conformance, test vectors, a DASH access engine reference software, and various sample software tools.

Research topics: The MPEG-DASH conformance and reference software provides the ideal playground for researchers as it can be used both to generate and to consume bitstreams compliant to the standard. This playground could be used together with other open source tools from the DASH-IFGPAC, and DASH@ITEC. An overview about DASH@ITEC's open source suite can be found here.

HEVC support in MPEG-2 Transport Stream and ISO Base Media File Format

After the completion of High Efficiency Video Coding (HEVC) - ITU-T H.265 | MPEG HEVC at the 103rd MPEG meeting in Geneva, HEVC bitstreams can be now delivered using the MPEG-2 Transport Stream (M2TS) and files based on the ISO Base Media File Format (ISOBMFF). For the latter, the scope of the Advanced Video Coding (AVC) file format has been extended to support also HEVC and this part of MPEG-4 has been renamed to Network Abstract Layer (NAL) file format. This file format now covers AVC and its family (Scalable Video Coding - SVC and Multiview Video Coding - MVC) but also HEVC.

Research topics: Research in the area of delivering audio-visual material is manifold and very well reflected in conference/workshops like ACM MMSys and Packet Video and associated journals and magazines. For these two particular standards, it would be interesting to see the efficiency of the carriage of HEVC with respect to the overhead.

Publicly available MPEG output documents

The following documents shall be come available at http://mpeg.chiariglione.org/ (availability in brackets - YY/MM/DD). If you have difficulties to access one of these documents, please feel free to contact me.
  • Requirements for HEVC image sequences (13/08/02)
  • Requirements for still image coding using HEVC (13/08/02)
  • Text of ISO/IEC 14496-16/PDAM4 Pattern based 3D mesh compression (13/08/02)
  • WD of ISO/IEC 14496-22 3rd edition (13/08/02)
  • Study text of DTR of ISO/IEC 23000-14, Augmented reality reference model (13/08/02)
  • Draft Test conditions for HEVC still picture coding performance evaluation (13/08/02)
  • List of stereo and 3D sequences considered (13/08/02)
  • Timeline and Requirements for MPEG-H Audio (13/08/02)
  • Working Draft 1 of Video Coding for browsers (13/08/31)
  • Test Model 1 of Video Coding for browsers (13/08/31)
  • Draft Requirements for Full Gamut Content Distribution (13/08/02)
  • Internet Video Coding Test Model (ITM) v 6.0 (13/08/23)
  • WD 2.0 MAR Reference Model (13/08/13)
  • Call for Proposals on MPEG User Description (MPEG-UD) (13/08/02)
  • Use Cases for MPEG User Description (13/08/02)
  • Requirements on MPEG User Description (13/08/02)
  • Text of white paper on MPEG Query Format (13/07/02)
  • Text of white paper on MPEG-7 AudioVisual Description Profile (AVDP) (13/07/02)

Wednesday, January 30, 2013

MPEG news: a report from the 103rd meeting, Geneva, Switzerland

MPEG plenary meeting at CICG in Geneva, CH
The 103rd MPEG meeting was held in Geneva, Switzerland, January 21-15, 2013. The official press release can be found here (doc only) and I'd like to introduce the new MPEG-H standard (ISO/IEC 23008) referred to as high efficiency coding and media delivery in heterogeneous environments:

  • Part 1: MPEG Media Transport (MMT) - status: 2nd committee draft (CD)
  • Part 2: High Efficiency Video Coding (HEVC) - status: final draft international standard (FDIS)
  • Part 3: 3D Audio - status: call for proposals (CfP)

MPEG Media Transport (MMT)

The MMT project was started in order to address the needs of modern media transport applications going beyond the capabilities offered by existing means of transportation such as formats defined by MPEG-2 transport stream (M2TS) or ISO base media file format (ISOBMFF) group of standards. The committee draft was approved during the 101st MPEG meeting. As a response to the CD ballot, MPEG received more than 200 comments from national bodies and, thus, decided to issue the 2nd committee draft which will be publicly available by February 7, 2013.

High Efficiency Video Coding (HEVC) - ITU-T H.265 | MPEG HEVC

HEVC is the next generation video coding standard jointly developed by ISO/IEC JTC1/SC29/WG11 (MPEG) and the Video Coding Experts Group (VCEG) of ITU-T WP 3/16. Please note that both ITU-T and ISO/IEC MPEG use the term "high efficiency video coding" in the the title of the standard but one can expect - as with its predecessor - that the former will use ITU-T H.265 and the latter will use MPEG-H HEVC for promoting its standards. If you don't want to participate in this debate, simply use high efficiency video coding.

The MPEG press release says that the "HEVC standard reduces by half the bit rate needed to deliver high-quality video for a broad variety of applications" (note: compared to its predecessor AVC). The editing period for the FDIS goes until March 3, 2013 and then with the final preparations and a 2 month balloting period (yes|no vote only) once can expect the International Standard (IS) to be available early summer 2013. Please note that there are no technical differences between FDIS and IS.

The ITU-T press release describes HEVC as a standard that "will provide a flexible, reliable and robust solution, future-proofed to support the next decade of video. The new standard is designed to take account of advancing screen resolutions and is expected to be phased in as high-end products and services outgrow the limits of current network and display technology."

HEVC currently defines three profiles:
  • Main Profile for the "Mass-market consumer video products that historically require only 8 bits of precision".
  • Main 10 Profile "will support up to 10 bits of processing precision for applications with higher quality demands".
  • Main Still Picture Profile to support still image applications, hence, "HEVC also advances the state-of-the-art for still picture coding"

3D Audio

The 3D audio standard shall complement MMT and HEVC assuming that in a "home theater" system a large number of loudspeakers will be deployed. Therefore, MPEG has issued a Call for Proposals (CfP) with the selection of the reference model v0 due in July 2013. The CfP says that MPEG-H 3D Audio "might be surrounding the user and be situated at high, mid and low vertical positions relative to the user’s ears. The desired sense of audio envelopment includes both immersive 3D audio, in the sense of being able to virtualize sound sources at any position in space, and accurate audio localization, in terms of both direction and distance."

"In addition to a “home theater” audio-visual system, there may be a “personal” system having a tablet-sized visual display with speakers built into the device, e.g. around the perimeter of the display. Alternatively, the personal device may be a hand-held smart phone. Headphones with appropriate spatialization would also be a means to deliver an immersive audio experience for all systems."

Complementary to the CfP, MPEG also provided the encoder input format for MPEG-H 3D audio and a draft MPEG audio core experiment methodology for 3D audio work.


Publicly available MPEG output documents

The following documents shall be come available at http://mpeg.chiariglione.org/ (note: some may have an editing period - YY/MM/DD). If you have difficulties to access one of these documents, please feel free to contact me.
  • Study text of DIS of ISO/IEC 23000-13, Augmented Reality Application Format (13/01/25)
  • Study text of DTR of ISO/IEC 23000-14, Augmented reality reference model (13/02/25)
  • Text of ISO/IEC FDIS 23005-1 2nd edition Architecture (13/01/25)
  • Text of ISO/IEC 2nd CD 23008-1 MPEG Media Transport (13/02/07)
  • Text of ISO/IEC 23008-2:201x/PDAM1 Range Extensions (13/03/22)
  • Text of ISO/IEC 23008-2:201x/PDAM2 Multiview Extensions (13/03/22)
  • Call for Proposals on 3D Audio (13/01/25)
  • Encoder Input Format for MPEG-H 3D Audio (13/02/08)
  • Draft MPEG Audio CE methodology for 3D Audio work (13/01/25)
  • Draft Requirements on MPEG User Descriptions (13/02/08)
  • Draft Call for Proposals on MPEG User Descriptions (13/01/25)
  • Draft Call for Proposals on Green MPEG (13/01/25)
  • Context, Objectives, Use Cases and Requirements for Green MPEG (13/01/25)
  • White Paper on State of the Art in compression and transmission of 3D Video (13/01/28)
  • MPEG Awareness Event Flyer at 104th MPEG meeting in Incheon (13/02/28)

Tuesday, October 23, 2012

MPEG news: a report from the 102nd meeting, Shanghai, China

The 102nd MPEG meeting was held in Shanghai, China, October 15-19, 2012. The official press release can be found here (not yet available) and I would like to highlight the following topics:
  • Augmented Reality Application Format (ARAF) goes DIS
  • MPEG-4 has now 30 parts: Let's welcome timed text and other visual overlays
  • Draft call for proposals for 3D audio 
  • Green MPEG is progressing
  • MPEG starts a new publicity campaign by making more working documents publicly available for free

Augmented Reality Application Format (ARAF) goes DIS

MPEG's application format dealing with augmented reality reached DIS status and is only one step away from becoming in international standard. In a nutshell, the MPEG ARAF enables to augment 2D/3D regions of scene by combining multiple/existing standards within a specific application format addressing certain industry needs. In particular, ARAF comprises three components referred to as scene, sensor/actuator, and media. The scene component is represented using a subset of MPEG-4 Part 11 (BIFS), the sensor/actuator component is defined within MPEG-V, and the media component may comprise various type of compressed (multi)media assets using different sorts of modalities and codecs.

A tutorial from Marius Preda, MPEG 3DG chair, at the Web3D conference in August 2012 is provided below.

MPEG-4 has now 30 parts

Let's welcome timed text and other visual overlays in the family of MPEG-4 standards. Part 30 of MPEG-4 - in combination with an amendment to the ISO base media file format (ISOBMFF) -  addresses the carriage of W3C TTML including its derivative SMPTE Timed Text, as well as WebVTT. The types of overlays include subtitles, captions, and other timed text and graphics. The text-based overlays include basic text and XML-based text. Additionally, the standards provides support for bitmaps, fonts, and other graphics formats such as scalable vector graphics.

Draft call for proposals for 3D audio

MPEG 3D audio is concerned about various test items ranging from 9.1 over 12.1 up to 22.1 channel configurations. A public draft call for proposals has been issued at this meeting with the goal to finalize the call and the evaluation guidelines at the next meeting. The evaluation will be conducted in two phases. Phase one for higher bitrates (1.5 Mbps to 265 kbps) is foreseen to conclude in July 2013 with the evaluation of the answers to the call and the selection of the "Reference Model 0 (RM0)" technology which will serve as a basis for the development of an 3D audio standard. The second phase targets lower bitrates (96 kbps to 48 kbps) and builds on RM0 technology after this has been documented using text and code.

Green MPEG is progressing

The idea between green MPEG is to define signaling means that enable energy efficient encoding, delivery, decoding, and/or presentation of MPEG formats (and possibly others) without the loss of Quality of Experience. Green MPEG will address this issue from an end-to-end point of view with the focus - as usual - on the decoder. However, a codec-centric design is not desirable as the energy efficiency should not be affected at the expenses of the other components of the media ecosystem. At the moment, first requirements have been defined and everyone is free to join the discussions on the email reflector within the Ad-hoc Group.

MPEG starts a new publicity campaign by making more working documents publicly available for free

As a response to national bodies comments, MPEG is starting from now on to make more documents publicly available for free. Here's a selection of these documents which are publicly available here. Note that some may have an editing period and, thus, are not available at the of writing this blog post.
  • Text of ISO/IEC 14496-15:2010/DAM 2 Carriage of HEVC (2012/11/02)
  • Text of ISO/IEC CD 14496-30 Timed Text and Other Visual Overlays in ISO Base Media File Format (2012/11/02)
  • DIS of ISO/IEC 23000-13, Augmented Reality Application Format (2012/11/07)
  • DTR of ISO/IEC 23000-14, Augmented reality reference model (2012/11/21)
  • Study of ISO/IEC CD 23008-1 MPEG Media Transport (2012/11/12)
  • High Efficiency Video Coding (HEVC) Test Model 9 (HM 9) Encoder Description (2012/11/30)
  • Study Text of ISO/IEC DIS 23008-2 High Efficiency Video Coding (2012/11/30)
  • Working Draft of HEVC Full Range Extensions (2012/11/02)
  • Working Draft of HEVC Conformance (2012/11/02)
  • Report of Results of the Joint Call for Proposals on Scalable High Efficiency Video Coding (SHVC) (2012/11/09)
  • Draft Call for Proposals on 3D Audio (2012/10/19)
  • Text of ISO/IEC 23009-1:2012 DAM 1 Support for Event Messages and Extended Audio Channel Configuration (2012/10/31)
  • Internet Video Coding Test Model (ITM) v 3.0 (2012/11/02)
  • Draft Requirements on MPEG User Descriptions (2012/10/19)
  • Draft Use Cases for MPEG User Description (Ver. 4.0) (2012/10/19)
  • Requirements on Green MPEG (2012/10/19)
  • White Paper on State of the Art in compression and transmission of 3D Video (Draft) (2012/10/19)
  • White Paper on Compact Descriptors for Visual Search (2012/11/09)

Thursday, August 2, 2012

MPEG news: a report from the 101st meeting, Stockholm, Sweden

The 101st MPEG meeting was held in Stockholm, Sweden, July 16-20, 2012. The official press release can be found here and I would like to highlight the following topics:
  • MPEG Media Transport (MMT) reaches Committee Draft (CD)
  • High-Efficiency Video Coding (HEVC) reaches Draft International Standard (DIS)
  • MPEG and ITU-T establish JCT-3V
  • Call for Proposals: HEVC scalability extensions
  • 3D audio workshop
  • Green MPEG
MMT goes CD

The Committee Draft (CD) of MPEG-H part 1 referred to as MPEG Media Transport (MMT) has been approved and will be publicly available after an editing period which will end Sep 17th. MMT comprises the following features:
  • Delivery of coded media by concurrently using more than one delivery medium (e.g., as it is the case of heterogeneous networks).
  • Logical packaging structure and composition information to support multimedia mash-ups (e.g., multiscreen presentation).
  • Seamless and easy conversion between storage and delivery formats.
  • Cross layer interface to facilitate communication between the application layers and underlying delivery layers.
  • Signaling of messages to manage the presentation and optimized delivery of media.
This list of 'features' may sound very high-level but as the CD usually comprises stable technology and is publicly available, the research community is more than welcome to evaluate MPEG's new way of media transport. Having said this, I would like to refer to the Call for Papers of  JSAC's special issue on adaptive media streaming which is mainly focusing on DASH but investigating its relationship to MMT is definitely within the scope.

HEVCs' next step towards completion: DIS

The approval of the Draft International Standard (DIS) brought the HEVC standard one step closer to completion. As reported previously, HEVC shows inferior performance gains compared to its predecessor and real-time software decoding on the iPad 3 (720p, 30Hz, 1.5 Mbps) has been demonstrated during the Friday plenary [1, 2]. It is expected that the Final Draft International Standard (FDIS) is going to be approved at the 103rd MPEG meeting in January 21-25, 2013. If the market need for HEVC is only similar as it was when AVC was finally approved, I am wondering if one can expect first products by mid/end 2013. From a research point of view we know - and history is our witness - that improvements are still possible even if the standard has been approved some time ago. For example, the AVC standard is now available in its 7th edition as a consolidation of various amendments and corrigenda.

JCT-3V

After the Joint Video Team (JVT) which successfully developed standards such as AVC, SVC, MVC and the Joint Collaborative Team on Video Coding (JCT-VC), MPEG and ITU-T establish the Joint Collaborative Team on 3D Video coding extension development (JCT-3V). That is, from now on MPEG and ITU-T also joins forces in developing 3D video coding extensions for existing codecs as well as the ones under development (i.e., AVC, HEVC). The current standardization plan includes the development of AVC multi-view extensions with depth to be completed this year and I assume HEVC will be extended with 3D capabilities once the 2D version is available.

In this context it is interesting that a call for proposals for MPEG Frame Compatible (MFC) has been issued to address current deployment issues of stereoscopic videos. The requirements are available here.

Call for Proposals: SVC for HEVC

In order to address the need for higher resolutions - Ultra HDTV - and subsets thereof, JCT-VC issued a call for proposals for HEVC scalability extensions. Similar to AVC/SVC, the requirements include that the base layer should be compatible with HEVC and enhancement layers may include temporal, spatial, and fidelity scalability. The actual call, the use cases, and the requirements shall become available on the MPEG Web site.

MPEG hosts 3D Audio Workshop

Part 3 of MPEG-H will be dedicated to audio, specifically 3D audio. The call for proposals will be issues at the 102nd MPEG meeting in October 2012 and submissions will be due at the 104th meeting in April 2013. At this meeting, MPEG has hosted a 2nd workshop on 3D audio with the following speakers.
  • Frank Melchior, BBC R&D: “3D Audio? - Be inspired by the Audience!”
  • Kaoru Watanabe, NHK and ITU: “Advanced multichannel audio activity and requirements”
  • Bert Van Daele, Auro Technologies: “3D audio content production, post production and distribution and release”
  • Michael Kelly, DTS: “3D audio, objects and interactivity in games”
The report of this workshop including the presentations will be publicly available by end of August at the MPEG Web site.

What's new: Green MPEG

Finally, MPEG is starting to explore a new area which is currently referred to as Green MPEG addressing technologies to enable energy-efficient use of MPEG standards. Therefore, an Ad-hoc Group (AhG) was established with the following mandates:

  1. Study the requirements and use-cases for energy efficient use of MPEG technology.
  2. Solicit further evidence for the energy savings.
  3. Develop reference software for Green MPEG experimentation and upload any such software to the SVN.
  4. Survey possible solutions for energy-efficient video processing and presentation.
  5. Explore the relationship between metadata types and coding technologies.
  6. Identify new metadata that will enable additional power savings.
  7. Study system-wide interactions and implications of energy-efficient processing on mobile devices.
AhGs are usually open to the public and all discussions take place via email. To subscribe please feel free to join the email reflector.

Thursday, March 22, 2012

MPEG news: a report from the 99th meeting, San Jose, CA, USA

The official press release is available here and I'd like to highlight two topics from MPEGs' 99th meeting in San Jose, CA, USA:
  • HEVC advances to Committee Draft (CD)
  • Public workshop on MPEG-H 3D Audio
High-Efficiency Video Coding reaches first formal milestone towards completion

As described in the official press release "ISO/IEC’s Moving Picture Experts Group (MPEG) is pleased to announce the completion of the ISO/IEC committee draft of the High Efficiency Video Coding (HEVC) standard developed by the Joint Collaborative Team on Video Coding (JCT-VC), a joint team between MPEG and the ITU-T’s Video Coding Experts Group (VCEG)". For those who are not familiar with the ISO/IEC standardization process, committee draft (CD) means that the standard is not yet finalized but entering the committee stage which enables national bodies to comment on the standard. That is, changes to HEVC can be only made through national body comments which needs to be registered in due time.

In terms of performance of HEVC one can conclude that the mission is accomplished. Preliminary HM5 vs. AVC subjective performance comparison looks impressive, i.e. > 50% bitrate reduction overall, specifically 67% in HD and 49% for WVGA sequences. Please note that these results are not validated through official verification tests which are usually conducted in a later stage of the standardization process.

From a deployment perspective currently one profile is foreseen which is preliminarily referred to as the "main" profile with a largest coding unit (LCU) between 16x16 and 64x64 and a max. pictures storage capacity always 6 (compared to AVC which is max. 16) among others.

Research issues: in my last report I wrote "the ultimative goal to have a performance gain of more than 50% compared to the predecessor which is AVC". It seems this has been achieved so one might wonder what else needs to be done. In practice, however, there is always space for improvement, right?

The next step in audio coding: MPEG-H 3D Audio

The MPEG-H 3D Audio Workshop attracted more than 100 attendees which followed presentations covering three areas of 3D audio.
  1. ATSC 3.0 and the Future of Broadcast Television (FoBTV)
  2. 22.2 multichannel sound for Ultra High Definition TV (UHDTV), Next Generation Broadcast Television, and New Heights in Multichannel Sound: Explorations and Considerations
  3. Realistic audio representation technologies for UHDTV, backward-compatible 3D audio coding, and innovating beyond 5.1.
The presentations are publicly available here within a single ZIP file. MPEG established an AhG on 3D Audio (and Audio Maintenance) with the following mandates (among others):
  • Progress possible use cases, requirements and evaluation methods for 3D Audio 
  • Identify test material appropriate for 3D Audio work and a process to make the material available to interested MPEG delegates.
Subscription to the reflector is open to everyone. A possible timeline for part 3 of MPEG-H could mean to have a Call for Proposals (CfP) in July 2012 followed by the evaluation in January 2013, all preliminary, no guarantee.

Finally, the next meeting will be MPEGs' 100th meeting which will include a social event with participation of representatives from ITU, ISO, IEC, and others.