Monday, December 7, 2009

O Universal Multimedia Access, Where Art Thou? (Part I)

-by Christian Timmerer, Klagenfurt University, Austria

Preface: First I thought about writing this article for a journal or something equivalent but then I concluded to make this article available through my blog. The aim is to perform an experiment in order to determine whether it is possible (a) to get direct feedback through comments and (b) to be referenced from elsewhere. As it is a quite comprehensive article, it’s split up in separate parts. If someone (i.e., a journal editor) is interested in publishing this article, yes, I can still do that! :-)

Part I – Introduction and Multimedia Content Adaptation Techniques


Back in 1999, an article was published in vol. 1/no. 2 of IEEE Transactions of Multimedia entitled “Adapting Multimedia Internet Content for Universal Access” [1] which can be roughly seen as the kick-off for a research effort that in subsequent papers was collectively referred to as Universal Multimedia Access (UMA). The initial aim of UMA was to provide access to multimedia content anywhere, anytime, and with any device. In the meanwhile, that is, (more than) 10 years later, I think it is worth looking back and reviewing what has been achieved so far.

Some argue and I tend to agree that the key to UMA is multimedia content adaptation [2] as depicted in Figure 1. The aim is the transformation of an input to an output in video or augmented multimedia forms utilizing manipulations at multiple levels (e.g., signal, structural, or semantic) in order to meet diverse resource constraints and user preferences while optimizing the overall utility of the multimedia content.



Figure 1. Concept of Multimedia Content Adaptation – adopted from [2].

How to adapt?

  • Temporal scaling: reduce number of frames
  • Spatial scaling: reduce number of pixels => reduce resolution
  • Frequency scaling: reduce number of DCT coefficients => reduce quality
  • Modality conversion: e.g., video to slide show also known as transmoding

Where to adapt?

  • Server, Proxy, Router, Gateway, Client, …

When to adapt?

  • A server could hold several variations of the same multimedia content – or – could react to changing (network) conditions
  • A proxy could adapt cached multimedia content in order to free space – or – could adapt it in an ad-hoc mode or on-demand
  • A router or gateway could drop marked segments (e.g. packets)
  • A client could subscribe only to those streams it can handle
  • etc.
Based on the observations above, multimedia content adaptation can be roughly categorized into adaptation by selection, adaptation by transcoding, and adaptation by transformation.

Adaptation by Selection


The idea here is to provide multiple versions of the same multimedia content and then select or switch to the most appropriate version according to the usage context. The InfoPyramid framework [1][3] was among the first approaches of this adaptation paradigm. Therefore, content descriptions are associated to individual components of the multimedia content which describes the content at different modalities, at different resolutions, and at multiple abstractions (cf. Figure 2).



Figure 2. InfoPyramid Framework – adopted from [3].

  • Multi-modal: Multimedia content is usually not in a single media format, or modality. A video clip can contain raw data from video, audio in two or more languages, closed captions, etc.
  • Multi-resolution: Each content component can also be described at multiple resolutions. Numerous resolution reduction techniques exist for constructing image and video resources.
  • Multiple-abstraction levels: The abstraction levels describe features and data in a hierarchical fashion. For example, one hierarchy could be features, semantics and object descriptions, and annotations and metadata itself.
In order to access the actual content one has to define methods for manipulating, translating, transcoding, and generating content which can happen in offline or online mode.
  • The offline mode generates variations as described by InfoPyramid before service deployment. On service request one can choose or select the prepared variations based on the InfoPyramid description. The variations are generated by applying appropriate adaptation techniques (e.g., transcoding) offline which indeed increases storage and asset management requirements.
  • The online mode provides the appropriate variation on-the-fly based on the InfoPyramid description and during the actual service request. Again, appropriate adaptation techniques (e.g., transcoding) are applied but this time online which increases processing (CPU) requirements, delay, etc.
However, in general it is difficult to anticipate and provide multimedia content given the large variety of formats, bit rates, etc. Furthermore and in offline mode, one needs to maintain and manage all these different version which is a waste of capacity. On the other hand, this approach yields good performance and little quality degradation.

Adaptation by transcoding


Although transcoding may be used as tool within the previous adaptation paradigm it is listed here as a separate approach due to its importance both in literature and industry [5][6][7]. The objective of transcoding is to satisfy usage environment constraints while maximizing the content value (objective/subjective quality) and minimizing the actual transcoding complexity. In general one can distinguish between re-coding and trans-coding.

Conventional approaches – recode – performs full decoding, post-processing, and full re-encoding as shown in Figure 3. This approach usually yields highest quality but is an expensive approach though and in many cases (real-time) it requires a hardware-based solution.



Figure 3. Conventional approaches – recode.
Low-cost approaches – transcode – targets similar quality as the conventional approach but with lower complexity. The focus is on architectures that utilize compressed-domain processing which enables software solutions to be deployed (cf. Figure 4).


Figure 4. Low-cost approaches – transcode.
In the following common transcoding operations are briefly highlighted:
  • Bit-rate reduction – sometimes also known as transrating (e.g., SDTV: 6Mbps => 3 Mbps, HDTV: 19.2 Mbps => 11 Mbps): The main challenges here are drift compensation due to re-quantization errors, the rate control algorithm, and the trade-off between quality and complexity. A vast amount of solutions have been proposed in the literature and it is nearly impossible to summarize them. Nevertheless, a good overview is given in [8].
  • Temporal resolution reduction (e.g., 30 fps => 10 fps): Due to frame dropping also a couple of issues arrive. That is, how to estimate a new motion vector based on incoming motion vectors by avoiding full motion vector re-estimation and how to estimate a new residual based on incoming residual values while minimize mismatch between predictive and residual components. Some approaches are described in [7][9].
  • Spatial resolution reduction (e.g., HDTV => SDTV; 720x480i, 30Hz => 352x240p 10Hz):
    • Motion vectors corresponding to reduced resolution reference frame => frame-based & field-based motion vector mapping.
    • Obtaining texture information for lower resolution MB’s => simple averaging (frame-based or field-based, computationally efficient) or block-based filters (typically more complex than required).
    • Drift compensation architecture due to re-quantization and down-sampling => cascaded architecture (full decoding/re-encoding), partial encoding architecture (full decoding followed by partial encoder), and intra refresh architecture (open-loop architecture).
  • Error-resilience enhancement: Improve robustness of bitstream for transmission or use retransmitted frames as reference even if they arrive too late for being display. With such approach the error propagation is eliminated while it would persist if retransmitted frames were discarded [7].
  • Syntax conversion (e.g., MPEG-2 Transport Stream => MPEG-2 Program Stream for DVD Recording; MPEG-2 => MPEG-4 for Broadcast to Mobile): This operation is often referred to as the classical transcoding operation and was/is the main driving use case for UMA. It usually benefits from the operations introduced above and is used in certain combinations. Recently, bit-stream rewriting has been introduced which allows for syntax conversion within a given family of video coding standards (e.g., SVC-to-AVC [10] or AVC-to-SVC [11]).
  • Modality conversion – sometimes also known as transmoding (e.g., video => slideshow; text => speech): The objective is here to modify the modality (e.g., audio, video, image, text) in order to satisfy transmission constraints and/or user preferences [12][13][14].
This is the end of Part I and I will continue in Part II with the adaptation by transformation approach that utilizes scalable coding formats such as JPEG2000, MPEG-4 BSAC, and MPEG-4 SVC. Thus, stay tuned!

References:
[1] R. Mohan, J. R. Smith, C.-S. Li, “Adapting Multimedia Internet Content for Universal Access,” IEEE Transactions on Multimedia, vol. 1, no. 1, 1999, pp. 104-114.
[2] S.F. Chang and A. Vetro, “Video Adaptation: Concepts, Technologies and Open Issues“, Proceedings of the IEEE, vol. 93, no. 1, Jan. 2005, pp. 148-158.
[3] C-S. Li, R. Mohan and J.R. Smith, “Multimedia Content Description in the InfoPyramid”, Proceedings ICASP’98, Special session on Signal Processing in Modern Multimedia Standards, Seattle, May 1998.
[4] B. Shen, W-T. Tan, F. Huve, “Dynamic Video Transcoding in Mobile Environments“, IEEE Multimedia, vol. 15, no. 1, Jan.-Mar. 2008, pp. 42-51.
[5] A. Vetro, C. Christopoulos and H. Sun, “An overview of video transcoding architectures and techniques“, IEEE Signal Processing Magazine, vol. 20, no. 2, Mar. 2003, pp. 18-29.
[6] J. Xin, C.W. Lin, M.T. Sun, “Digital Video Transcoding“, Proceedings of IEEE, vol. 93, no. 1, Jan. 2005, pp. 84-97.
[7] B. Shen, W-T. Tan, F. Huve, “Dynamic Video Transcoding in Mobile Environments”, IEEE Multimedia, vol. 15, no. 1, Jan.-Mar. 2008, pp. 42-51.
[8] S. Liu, A. Bovik, "Digital Video Transcoding", in A. Bovik, The Essential Guide to Video Processing, Academic Press, 2009.
[9] Fung, et al., “New architecture for dynamic frame skipping transcoder”, IEEE Transactions on Image Processing, vol. 11, no. 8, Aug. 2002, pp. 886-900.
[10] A. Segall, J. Zhao, “Bit-stream rewriting for SVC-to-AVC conversion”, 15th International Conference on Image Processing (ICIP2008), San Diego, USA, Oct. 2008, pp. 2776-2779.
[11] J. De Cock, S. Notebaert, P. Lambert, R. Van de Walle, “Advanced bitstream rewriting from H.264/AVC to SVC”, 15th International Conference on Image Processing (ICIP2008), San Diego, USA, Oct. 2008, pp. 2472-2475.
[12] T. C. Thang, Y. J. Jung, J. W. Lee, Y. M. Ro, “Modality Conversion for Universal Multimedia Services”, Proceeding 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2004), Lisboa, Portugal, April, 2004.
[13] T. C. Thang, Y. J. Jung, and Y. M. Ro, “Modality Conversion for QoS Management in Universal Multimedia Access”, IEE Proceedings: Vision, Image & Signal Processing, vol. 152, no. 3, Jun. 2005, pp.374-384.
[14] M.K. Asadi, J.-C. Dufourd, “Multimedia Adaptation by Transmoding in MPEG-21”, Proceeding 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2004), Lisboa, Portugal, April, 2004.

No comments: