Tuesday, December 22, 2009

O Universal Multimedia Access, Where Art Thou? (Part II)

-by Christian Timmerer, Klagenfurt University, Austria

Preface: First I thought about writing this article for a journal or something equivalent but then I concluded to make this article available through my blog. The aim is to perform an experiment in order to determine whether it is possible (a) to get direct feedback through comments and (b) to be referenced from elsewhere. As it is a quite comprehensive article, it’s split up in separate parts. If someone (i.e., a journal editor) is interested in publishing this article, yes, I can still do that! :-)

Part I was about giving an introduction to the topic and an overview on multimedia content adaptation techniques. This part focuses on the adaptation by transformation approach that utilizes scalable coding formats such as JPEG2000, MPEG-4 BSAC, and MPEG-4 SVC and is mainly based on [1].

Part II – Adaptation by Transformation


Scalable coding techniques have been recognized as an appropriate tool for realizing the concepts of UMA. Furthermore, if widely adopted across industries, scalable coding would provide a generalized solution to the interoperability problem.

In [2], a scalable bitstream is defined as a coded multimedia resource (i.e., audio-visual multimedia resources) consisting of a structured sequence of binary symbols which is organized in such a way that, by retrieving the bitstream, it is possible to first render a degraded version of the bitstream, and then progressively improve it by loading additional data. This definition implicates a bitstream structure where the bitstream can be logically divided into several layers, i.e., a base layer and one or more enhancement layers. The base layer offers a minimal quality of the bitstream whereas each of the enhancement layers successively provides improvements with respect to the quality in various dimensions. These dimensions include improvements in the temporal, spatial, signal-to-noise ratio (SNR), color, region-of-interest (ROI), and complexity domain, among others. Recently, abstract models describing scalable bitstreams have been proposed [3][4] which are briefly reviewed in the following.

In general, a scalable bitstream can be organized in a logical hypercube model where each axis represents a scalability dimension (e.g., temporal, spatial, quality) and every data block within this model corresponds to a certain bitstream segment (cf. Figure 1). The adaptation of a bitstream corresponding to such a model comprises the removal of one or more data blocks sometimes followed by minor updates of the remaining data blocks. Please go to [4] for a more detailed overview on adaptation possibilities.

Figure 1. Scalability Model using the Hypecube Model according to [3][4].


In the following I'd like to introduce some coding formats and their scalability features featuring the hypercube model as introduced above, namely:

  • JPEG2000 which introduces spatial, color, SNR, and ROI scalability for still images; 
  • MPEG-4 Visual Elementary Stream (VES) with temporal and semantic scalability;
  • MPEG-4 BSAC with fine-grained SNR scalability;
  • MPEG-4 SVC with native support for temporal, spatial, and SNR scalability.
Please note each scalable coding format is introduced with a special focus on its scalability aspects. For details regarding basic coding techniques the reader is referred to appropriate literature, e.g., [5] or [6].


JPEG2000


The JPEG2000 standard [7][8][9] is known as the successor of the world-famous and widely adopted JPEG standard [10]. The JPEG2000 standard has been developed in order to accommodate the increasing demands and additional requirements for multimedia and Internet applications. In particular, some of the most important features (with respect to scalability) the JPEG200 standard should offer are progressive transmission by pixel accuracy and resolution as well as Region of Interest (ROI) coding and random code-stream access and processing. Progressive transmission enables the rendering of images with different resolution and pixel accuracy starting from a base version up to a high-resolution/quality version in an incremental manner, i.e. more and more data is added to the base layer by only transmitting the additional data which is required for increasing quality and/or resolution.

The hypercube model for JPEG2000 with the dimensions represent color, spatial, and SNR scalability respectively is depicted in Figure 2.

Figure 2. Hypercube for JPEG2000 scalability and a possible bitstream layout.

In particular, the figure shows the hypercube for JPEG2000 with its scalability dimensions and a possible bitstream layout with quality-spatial-color progression order. The gray cube represents the base layer with QCIF dimension, Y color component only, any a quality of 29 dB PSNR. In contrast, the blue cubes represent another version of the tile including more quality layers, i.e., a PSNR of 31 dB, with a CIF resolution but still only one color component, i.e., the resulting image is still a grayscale version of the original image.

MPEG-4 Visual Elementary Streams


MPEG-4 [11][12][13] also provides support for scalability in the spatial, temporal, and SNR dimensions but only a small amount of the scalability features has been adopted by industry, i.e., the temporal scalability. The spatial and SNR scalability features introduced too much coding overhead which was the main reason for not adopting these features at this time.

Temporal scalability is often also referred to as frame dropping where frames or visual object planes (VOPs) are removed which are not used as a reference frame for other frames. Bi-directional coded VOPs (B-VOPs) are not used as a reference for other frames, i.e., B-VOPs can be dropped arbitrarily. In case a predictive coded VOP (P-VOP) needs to be dropped all corresponding B-VOPs which use this P-VOP as reference frame need to be dropped as well. Similar behavior holds for intra coded VOPs (I-VOPs) although usually not dropped in traditional temporal scalability scenarios.

Another dimension of scalability is introduced here known as semantic scalability. This additional dimension associates properties to a group of VOPs (GoVs) providing means for summarization or personalization of MPEG-4 visual resources. With respect to the scalability model, GoVs can be compared with parcels and VOPs can be seen as the data blocks. The semantic scalability is, of course, also applicable for other audio/visual coding formats including those introduced in this article.

Figure 3. Hypercube for MPEG-4 VES scalability and a possible bitstream layout.

A possible configuration for an MPEG-4 Visual Elementary Stream hypercube model is depicted in Figure 3 with two levels of scalability, namely temporal and semantic. The former is characterized by different frame rates and the latter uses terms from the Internet Content Rating Association (ICRA) for rating the violence level of the actual content. For example, level 0 indicates no violence or sports-related content, respectively. The gray block represents a base layer (e.g., a scene or even only one I-VOP) with a frame rate of 15 Hz and a violence level 0 whereas the blue block indicate a scene with violence level 2 and 20 frames per second (fps).

MPEG-4 Bit-Sliced Arithmetic Coding


The concept of bit-sliced arithmetic coding for audio coding was introduced in [14] but is also excerpted in [11][15]. It is very similar to the well-known Advanced Audio Coding (AAC) [16] scheme except that the quantized values are not Huffman coded but arithmetically coded in bit-slices. Thus, MPEG-4 Bit-Sliced Arithmetic Coding (BSAC) provides fine-grain scalability of approximately 1 kbit/s per audio channel per enhancement layer. The base layer comprises side information, scaling factors and the actual audio data according to the bit rate of the base layer. Each enhancement layer incrementally adds more and more information with respect to the bit rate and a maximum of 48 enhancement layers are allowed. Due to the small size of the enhancement layers, i.e., 20 to 60 bits per AAC frame typically representing 20 to 30 ms, which may result in undesired packetization overhead, data packets of consecutive frames can be grouped together.

Figure 4. Hypercube for MPEG-4 BSAC scalability and a possible bitstream layout.

A hypercube model for a stereo MPEG-4 BSAC bitstream including a possible bitstream layout is illustrated in Figure 4. The base layer is encoded at 48 kbit/s/channel and a possible adapted stereo version of the bitstream with 50 kbit/s is indicated as well.

MPEG-4 Scalable Video Coding


MPEG-4 Scalable Video Coding (SVC) [17] is being introduced as an extension of MPEG-4 Advanced Video Coding (AVC) [18] which is part 10 of the MPEG-4 family of audio/visual coding standards. MPEG-4 SVC natively supports three scalability dimensions, namely temporal, spatial, and quality (SNR).

Figure 5.  Hypercube for MPEG-4 SVC scalability and a possible bitstream layout.

In Figure 5 a hypercube model with the three scalability dimensions of MPEG-4 SVC including a possible bitstream layout is shown. In this example, the base layer provides a QCIF version at 20Hz with a PSNR of 28dB. Additionally, an improved version with higher temporal, spatial, and SNR resolution is indicated.

Adaptation of Scalable Bitstreams


The adaptation of scalable bitstreams can be basically organized into two category:
  • The first category is a coding-format specific approach which, in general, is applicable to one coding format only such as the Bitstream Extractor that is part of the Joint Scalable Video Model (JSVM). The disadvantage here is that for each coding format a separate "bitstream extraxtor" is needed which become an issue for a growing number of instances.
  • The second category is referred to as coding-format independent or generic approach that is applicable to all scalable coding format but requires additional metadata [19]. As this approach is rather new and not commonly known, I will give an brief overview in the following.
Please note that a comparison between the generic and specific approach in the context of SVC is reported in [20].

Generic Multimedia Content Adaptation


This section discusses means to process (i.e., adapt, customize, manipulate, etc.) multimedia content independently of the actual coding format by utilizing XML-based metadata describing the high-level structure (i.e., syntax) of a bitstream. That is, the resulting XML document describes the bitstream how it is organized at different syntactical and even semantic levels, e.g., in terms of packets, headers, layers, units, segments, shots, scenes, etc., depending on the actual application requirements. It is important to note that the XML description does not describe the bitstream on a bit-by-bit basis, i.e., it does not replace the actual bitstream but provides metadata regarding bit/byte positions of meaningful segments for the given application. Therefore, the XML description does not necessarily provide any information of the actual coding format used as only the positions and – in some cases – meanings are required for processing.

High-level Architecture of Generic Content Adaptation

Figure 6 depicts the high-level architecture of generic multimedia content adaptation which can be logically divided into two processes, namely the Description Transformation and the Bitstream Generation.

Figure 6. High-level architecture of Generic Multimedia Content Adaptation (adopted from [21]).

The description transformation process receives as an input the XML description of the source bitstream and a so-called style sheet that transforms the XML document according to the context information, e.g., the device capabilities. The output of this process is a transformed description which already reflects the bitstream segments of the target (i.e., adapted) bitstream. However, the transformed description still refers to the bit/byte positions of the source bitstream which needs to be parsed in order to generate the target bitstream within the second step of the adaptation process, i.e., the bitstream generation.

Please note that the description transformation and bitstream generation processes should be combined by applying appropriate implementation techniques in order to achieve the required performance. However, implementation and optimization techniques for this kind of approach are out of scope of this article and the interested reader is referred to [22-25].

Technical Solution Approaches

The literature offers several technical solution approaches for generic multimedia content adaptation which are briefly highlighted in the following:
  • (X)Flavor [26]: A Formal Language for Audio-Visual Object Representation which has been extended with XML features.
  • Bitstream Syntax Description Language (BSDL) [27]: An XML Schema-based language for constructing a Bitstream Syntax Schema (BS Schema) for a given coding format [28]. It enables the generation of a Bitstream Syntax Description (BSD) based on a given bitstream and vice versa. The generic counterpart of the coding format-specific BS Schema is referred to as gBS Schema which is fully coding format-agnostic. An XML document conforming to the gBS Schema is referred to as a generic Bitstream Syntax Description (gBSD) [29].
  • BFlavor [30]: A method that combines BSDL and XFlavor and basically uses XFlavor techniques – enhanced with BSDL concepts – to generate Java code which is used for automatic generation of BSDs.

Summary


Figure 7 gives a summary of the various multimedia content adaptation techniques presented in Part I and Part II. The summary has been adopted and extended from [31].

Figure 7. Summary of Multimedia Content Adaptation (adopted from [31]).

This is the end of Part II and I will continue in Part III with the adaptation decision-taking also known as the brain of multimedia content adaptation. Thus, stay tuned!

References:
[1] C. Timmerer, Generic Adaptation of Scalable Multimedia Resources, VDM Verlag Dr. Müller, 2008.
[2] ISO/IEC 21000-7, Information technology — Multimedia framework (MPEG-21) — Part 7: Digital Item Adaptation, October 2004.
[3] S. Lerouge, R. De Sutter, P. Lambert, and R. Van de Walle, "Fully Scalable Video Coding in Multicast Applications", Proceedings of SPIE/Electronic Imaging 2004, vol. 5308, San Jose, CA, US, 2004, pp. 555-564.
[4] D. Mukherjee, A. Said, and S. Liu, "A framework for fully format-independent adaptation of scalable bit-streams," IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Video Adaptation, vol. 15, no. 10, October 2005, pp. 1280-1290.
[5] R. Steinmetz, Multimedia-Technologie. Grundlagen, Komponenten und Systeme, Springer, Berlin, July 2000.
[6] F. Halsall, Multimedia Communications. Applications, Networks, Protocols and Standards, Addison Wesley, November 2000.
[7] ISO/IEC 15444-1:2004, Information technology — JPEG 2000 image coding system: Core coding system, 2nd edition, September 2004.
[8] D. Taubman and M. Marcellin (eds.), JPEG2000: Image Compression Fundamentals, Standards and Practice, Springer, November 2001.
[9] C. Christopoulos, A. Skodras, and T. Ebrahimi, "The JPEG2000 Still Image Coding System: An Overview", IEEE Transactions on Consumer Electronics, vol. 46, no. 4, November 2000, pp. 1103-1127.
[10] G. K. Wallace, "The JPEG still picture compression standard", Communications of the ACM, vol. 34, no. 4, April 1991, pp. 30-44.
[11] F. Pereira and T. Ebrahimi (eds.), The MPEG-4 Book, Prentice Hall PTR, August 2002.
[12] S. Battista, F. Casalino, and C. Lande, "MPEG-4: A Multimedia Standard for the Third Millennium, Part 1", IEEE MultiMedia Magazine, vol. 6, no. 4, October-December 1999, pp. 74-83.
[13] S. Battista, F. Casalino, and C. Lande, "MPEG-4: A Multimedia Standard for the Third Millennium, Part 2", IEEE MultiMedia Magazine, vol. 7, no. 1, January-March 2000, pp. 76-84.
[14] S. Park, Y. Kim, S. Kim, and Y. Seo, "Multi-Layer Bit-Sliced Bit-Rate Scalable Audio Coding", in 103rd AES Convention, preprint 4520, New York, September 1997.
[15] H. Prunhagen, "An Overview of MPEG-4 Audio Version 2", Proceedings of AES 17th International Conference on High-Quality Audio Coding, Florence, Italy, September 1999, pp. 157-168.
[16] ISO/IEC 13818-7:2006, Information technology — Generic coding of moving pictures and associated audio information — Part 7: Advanced Audio Coding (AAC), 4th edition, January 2006.
[17] H. Schwarz, D. Marpe, T. Wiegand, "Overview of the Scalable Video Coding Extensions of the H.264/AVC Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, Sep. 2007, pp. 1103-1120.
[18] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, A. Luthra, "Overview of the H.264/AVC Video Coding Standard", IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, July 2003, pp. 560-576.
[19] C. Timmerer, M. Ransburg, and H. Hellwagner, "Generic Multimedia Content Adaptation", in: Borko Furht (ed.), Encyclopedia of Multimedia, 2nd edition, Springer, pp. 263-271, October 2008.
[20] M. Eberhard, L. Celetto, C. Timmerer, E. Quacchio and H. Hellwagner, "Performance Analysis of Scalable Video Adaptation: Generic versus Specific Approach", Proceedings of WIAMIS 2008, Klagenfurt, Austria, May 2008.
[21] C. Timmerer and H. Hellwagner, “Interoperable Adaptive Multimedia Communication”, IEEE Multimedia Magazine, vol. 12, no. 1, pp. 74-79, January-March 2005.
[22] C. Timmerer, G. Panis, and E. Delfosse, “Piece-wise Multimedia Content Adaptation in Streaming and Constrained Environments”, Proceedings of the 6th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005), Montreux, Switzerland, April 2005.
[23] C. Timmerer, T. Frank, and H. Hellwagner, “Efficient processing of MPEG-21 metadata in the binary domain”, Proceedings of SPIE International Symposium ITCom 2005 on Multimedia Systems and Applications VIII, Boston, Massachusetts, USA, October 2005.
[24] M. Ransburg, C. Timmerer, H. Hellwagner, and S. Devillers, “Processing and Delivery of Multimedia Metadata for Multimedia Content Streaming”, Proceedings of the Workshop Multimedia Semantics - The Role of Metadata, RWTH Aachen, March 2007.
[25] M. Ransburg, H. Gressl, and H. Hellwagner, “Efficient Transformation of MPEG-21 Metadata for Codec-agnostic Adaptation in Real-time Streaming Scenarios”, Proceedings of the 9th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2008), Klagenfurt, Austria, May 2008.
[26] D. Hong and A. Eleftheriadis, “XFlavor: Bridging Bits and Objects in Media Representation”, Proceedings IEEE International Conference on Multimedia and Expo (ICME), Lausanne, Switzerland, pp. 773- 776, August 2002.
[27] M. Amielh and S. Devillers, “Bitstream Syntax Description Language: Application of XML-Schema to Multimedia Content”, 11th International World Wide Web Conference (WWW 2002), Honolulu, May, 2002.
[28] G. Panis, A. Hutter, J. Heuer, H. Hellwagner, H. Kosch, C. Timmerer, S. Devillers and M. Amielh, “Bitstream Syntax Description: A Tool for Multimedia Resource Adaptation within MPEG-21”, Signal Processing: Image Communication, vol. 18, no. 8, pp. 721-747, September 2003.
[29] C. Timmerer, G. Panis, H. Kosch, J. Heuer, H. Hellwagner, and A. Hutter, “Coding format independent multimedia content adaptation using XML”, Proceedings of SPIE International Symposium ITCom 2003 on Internet Multimedia Management Systems IV, Orlando, Florida, USA, pp. 92-103, September 2003.
[30] W. De Neve, D. Van Deursen, D. De Schrijver, S. Lerouge, K. De Wolf, and R. Van de Walle, “BFlavor: A harmonized approach to media resource adaptation, inspired by MPEG-21 BSDL and XFlavor”, Signal Processing: Image Communication, vol. 21, no. 10, pp. 862-889, November 2006.
[31] B. Shen, W-T. Tan, F. Huve, “Dynamic Video Transcoding in Mobile Environments“, IEEE Multimedia, vol. 15, no. 1, Jan.-Mar. 2008, pp. 42-51.

Monday, December 21, 2009

MOBIMEDIA 2010 - Call for Papers



MOBIMEDIA 2010

6th International Mobile Multimedia Communications Conference

6-8 September May 2010, Lisboa, Portugal


http://www.mobimedia.org/

CALL FOR PAPERS



SPONSORS
Sponsored by ICST
Technically co-sponsored by CREATE-NET

***

SCOPE

The focus of MOBIMEDIA 2010 is future internet for green and pervasive media.

The promise of a truly pervasive experience is to have the freedom to roam around anywhere and not be bound to a single location, however, the energy required to keep mobile devices connected to the network over extended periods of time quickly dissipates. In fact, energy is a critical resource in the design of wireless networks since wireless devices are usually powered by batteries.

>From the mobile manufacturer's perspective the energy consumption problem is critical, not only technically but also taking into account the market expectations from a newly introduced technology. In fact there exists a continuously growing gap between the energy consumption of emerging radio systems and what can be achieved by Battery technology evolution; Scaling and circuit design progress; System level architecture progress; and  Thermal and cooling techniques. Without new approaches for energy saving, there is a significant threat that the 4G mobile users will be searching for power outlets rather than network access, and becoming once again bound to a single location.

There is added momentum from the European stage for green radio to globally reduce the electromagnetic radiation levels to have a better coexistence of wireless system (less interference) as well as a reduced human exposure to radiation leading to the so called Green Wireless technologies.

Hence, there is a clear need for disruptive strategies to address all aspects of power efficiency from the user devices through to the core infrastructure of the network and how these devices and equipment interact with each other.

In this framework, Mobimedia will provide an international forum where industry, researchers and academia will be able to interact and exchange experiences, ideas, and research results surrounding all aspects on power saving to envisage a environmentally friendly internet of the future.

***

TOPICS
Prospective authors are invited to submit original and unpublished technical papers on the following research topics, but not limited to:

  • Cooperative communications
  • Cognitive radio networks
  • Software defined radio approaches
  • Shortr-ange communications
  • Energy efficient radio resource management and routing
  • QoE multimedia for Future Internet
  • P2P multimedia for autonomic wireless infrastructures.
  • Energy efficient reconfigurable radio transceivers
  • Network coding
  • Internet of things
  • Cross-layer and Cross-System optimization strategies for wireless multimedia
  • User and device location in next generation networks
  • Security mechanisms and schemes in cooperative networks
  • Resource and mobility management in cooperative network
  • Dynamic resource management and bandwidth assignment for multimedia in heterogeneous wireless networks.
  • IMS for Multimedia services in next generation networks

***

SUBMISSION

For submission guidelines, please visit http://www.mobimedia.org/submission.shtml

***

IMPORTANT DATES
Papers due: 23 April 2010
Notification of paper acceptance: 28 May 2010
Submission of camera-ready papers: 25 June 2010

***

ORGANIZING COMMITTEE
General Chair
Jonathan Rodriguez, Instituto de Telecomunicações, Portugal

Program Committee Co-Chairs
Rahim Tafazolli, University of Surrey, UK
Christos Verikoukis, CTTC, Spain

For a complete list of committee and board members, please visit http://www.mobimedia.org/committees.shtml


Friday, December 18, 2009

Filter your RSS feeds and getting into troubles?

Do you also have subtle problems when performing simple filter operations on RSS feeds? If yes, well, there's still some hope. Almost since the beginning I experience very strange problems when filtering the RSS feed of my blog for certain categories. I know this should not be problem but unfortunately, it is...

My first address for RSS filtering was Yahoo Pipes! which allows one more than doing simple RSS filtering, it basically allows one to "re-wire the Web". For some reason, if I filter for blog posts associated to a certain category (e.g., computingnow) I always get zero items as shown in the figure below. Very strange as it works for other feeds but not this one. Changing the feed and/or blogging platform is not an option at the moment.


After doing the whole troubleshooting for while, it was clear that I have to search for another solution. That's why I did a Web search to find some alternatives and I found an interesting article describing six ways to filter RSS and there are even more. I tried some (e.g., Feed Rinse just produced an error when filtering for this term and FilterMyRSS works but is currently unavailable) and after some time - quite some time acutally - I wrote the following:


It's a very simple XSLT style sheet that filters for the category 'computingnow'. Next to that I've enabled a cronjob in the home directory of my Web space that executes the following script on a regular basis (e.g., every half an hour or so):
java -jar saxon9.jar -s:http://feeds.feedburner.com/MultimediaCommunication -xsl:rss-filter.xslt -o:public_html/ct4computingnow.xml
Well, and that's the story how this article is brought to you and the feed I've filtered myself now is available here. Of course, I could make the XSLT style sheet much more flexible by adding parameter support but that's another story...

Tuesday, December 15, 2009

IPTV and Video Networks in the 2015 Timeframe: The Evolution to medianets

Interesting article which also addresses "the way out of the middleware maze".

Interestingly, the article states that "although it is unlikely that all aspects of IPTV middleware, such as DRM or user control interfaces, will be standardized to the point that open application programming interfaces (APIs) will be available for all capabilities, the primary objective of specifying a standardized framework architecture for middleware is to enable definition of some of the open APIs to enhance interoperability and thereby reduce development costs."

Well, that's exactly what ISO/MPEG's Advanced IPTV Terminal (AIT) standard is about. See my blog or http://www.chiariglione.org/mpeg/ for further details!

in reference to:

"The rapid progress being made in technologies that enable video content delivery over IP networks to consumers has prompted predictions that the evolution of IP-based next-generation networks will be largely driven by video service delivery requirements. This article surveys trends in the underlying technologies, extrapolating out to the 2015 timeframe, and drawing on the developments in standardization for IPTV, cable networks, and the IP NGN. These evolution trends lead to the notion of a medianet as a useful way to think of all of the enabling video and multimedia technologies. A medianet is essentially an IP network that is optimized to deliver video services to any or multiple display devices, and uses any of optical, cable, wireline, and wireless networks for this purpose."
- IEEE Communications Magazine (auf Google Sidewiki anzeigen)

Monday, December 7, 2009

O Universal Multimedia Access, Where Art Thou? (Part I)

-by Christian Timmerer, Klagenfurt University, Austria

Preface: First I thought about writing this article for a journal or something equivalent but then I concluded to make this article available through my blog. The aim is to perform an experiment in order to determine whether it is possible (a) to get direct feedback through comments and (b) to be referenced from elsewhere. As it is a quite comprehensive article, it’s split up in separate parts. If someone (i.e., a journal editor) is interested in publishing this article, yes, I can still do that! :-)

Part I – Introduction and Multimedia Content Adaptation Techniques


Back in 1999, an article was published in vol. 1/no. 2 of IEEE Transactions of Multimedia entitled “Adapting Multimedia Internet Content for Universal Access” [1] which can be roughly seen as the kick-off for a research effort that in subsequent papers was collectively referred to as Universal Multimedia Access (UMA). The initial aim of UMA was to provide access to multimedia content anywhere, anytime, and with any device. In the meanwhile, that is, (more than) 10 years later, I think it is worth looking back and reviewing what has been achieved so far.

Some argue and I tend to agree that the key to UMA is multimedia content adaptation [2] as depicted in Figure 1. The aim is the transformation of an input to an output in video or augmented multimedia forms utilizing manipulations at multiple levels (e.g., signal, structural, or semantic) in order to meet diverse resource constraints and user preferences while optimizing the overall utility of the multimedia content.



Figure 1. Concept of Multimedia Content Adaptation – adopted from [2].

How to adapt?

  • Temporal scaling: reduce number of frames
  • Spatial scaling: reduce number of pixels => reduce resolution
  • Frequency scaling: reduce number of DCT coefficients => reduce quality
  • Modality conversion: e.g., video to slide show also known as transmoding

Where to adapt?

  • Server, Proxy, Router, Gateway, Client, …

When to adapt?

  • A server could hold several variations of the same multimedia content – or – could react to changing (network) conditions
  • A proxy could adapt cached multimedia content in order to free space – or – could adapt it in an ad-hoc mode or on-demand
  • A router or gateway could drop marked segments (e.g. packets)
  • A client could subscribe only to those streams it can handle
  • etc.
Based on the observations above, multimedia content adaptation can be roughly categorized into adaptation by selection, adaptation by transcoding, and adaptation by transformation.

Adaptation by Selection


The idea here is to provide multiple versions of the same multimedia content and then select or switch to the most appropriate version according to the usage context. The InfoPyramid framework [1][3] was among the first approaches of this adaptation paradigm. Therefore, content descriptions are associated to individual components of the multimedia content which describes the content at different modalities, at different resolutions, and at multiple abstractions (cf. Figure 2).



Figure 2. InfoPyramid Framework – adopted from [3].

  • Multi-modal: Multimedia content is usually not in a single media format, or modality. A video clip can contain raw data from video, audio in two or more languages, closed captions, etc.
  • Multi-resolution: Each content component can also be described at multiple resolutions. Numerous resolution reduction techniques exist for constructing image and video resources.
  • Multiple-abstraction levels: The abstraction levels describe features and data in a hierarchical fashion. For example, one hierarchy could be features, semantics and object descriptions, and annotations and metadata itself.
In order to access the actual content one has to define methods for manipulating, translating, transcoding, and generating content which can happen in offline or online mode.
  • The offline mode generates variations as described by InfoPyramid before service deployment. On service request one can choose or select the prepared variations based on the InfoPyramid description. The variations are generated by applying appropriate adaptation techniques (e.g., transcoding) offline which indeed increases storage and asset management requirements.
  • The online mode provides the appropriate variation on-the-fly based on the InfoPyramid description and during the actual service request. Again, appropriate adaptation techniques (e.g., transcoding) are applied but this time online which increases processing (CPU) requirements, delay, etc.
However, in general it is difficult to anticipate and provide multimedia content given the large variety of formats, bit rates, etc. Furthermore and in offline mode, one needs to maintain and manage all these different version which is a waste of capacity. On the other hand, this approach yields good performance and little quality degradation.

Adaptation by transcoding


Although transcoding may be used as tool within the previous adaptation paradigm it is listed here as a separate approach due to its importance both in literature and industry [5][6][7]. The objective of transcoding is to satisfy usage environment constraints while maximizing the content value (objective/subjective quality) and minimizing the actual transcoding complexity. In general one can distinguish between re-coding and trans-coding.

Conventional approaches – recode – performs full decoding, post-processing, and full re-encoding as shown in Figure 3. This approach usually yields highest quality but is an expensive approach though and in many cases (real-time) it requires a hardware-based solution.



Figure 3. Conventional approaches – recode.
Low-cost approaches – transcode – targets similar quality as the conventional approach but with lower complexity. The focus is on architectures that utilize compressed-domain processing which enables software solutions to be deployed (cf. Figure 4).


Figure 4. Low-cost approaches – transcode.
In the following common transcoding operations are briefly highlighted:
  • Bit-rate reduction – sometimes also known as transrating (e.g., SDTV: 6Mbps => 3 Mbps, HDTV: 19.2 Mbps => 11 Mbps): The main challenges here are drift compensation due to re-quantization errors, the rate control algorithm, and the trade-off between quality and complexity. A vast amount of solutions have been proposed in the literature and it is nearly impossible to summarize them. Nevertheless, a good overview is given in [8].
  • Temporal resolution reduction (e.g., 30 fps => 10 fps): Due to frame dropping also a couple of issues arrive. That is, how to estimate a new motion vector based on incoming motion vectors by avoiding full motion vector re-estimation and how to estimate a new residual based on incoming residual values while minimize mismatch between predictive and residual components. Some approaches are described in [7][9].
  • Spatial resolution reduction (e.g., HDTV => SDTV; 720x480i, 30Hz => 352x240p 10Hz):
    • Motion vectors corresponding to reduced resolution reference frame => frame-based & field-based motion vector mapping.
    • Obtaining texture information for lower resolution MB’s => simple averaging (frame-based or field-based, computationally efficient) or block-based filters (typically more complex than required).
    • Drift compensation architecture due to re-quantization and down-sampling => cascaded architecture (full decoding/re-encoding), partial encoding architecture (full decoding followed by partial encoder), and intra refresh architecture (open-loop architecture).
  • Error-resilience enhancement: Improve robustness of bitstream for transmission or use retransmitted frames as reference even if they arrive too late for being display. With such approach the error propagation is eliminated while it would persist if retransmitted frames were discarded [7].
  • Syntax conversion (e.g., MPEG-2 Transport Stream => MPEG-2 Program Stream for DVD Recording; MPEG-2 => MPEG-4 for Broadcast to Mobile): This operation is often referred to as the classical transcoding operation and was/is the main driving use case for UMA. It usually benefits from the operations introduced above and is used in certain combinations. Recently, bit-stream rewriting has been introduced which allows for syntax conversion within a given family of video coding standards (e.g., SVC-to-AVC [10] or AVC-to-SVC [11]).
  • Modality conversion – sometimes also known as transmoding (e.g., video => slideshow; text => speech): The objective is here to modify the modality (e.g., audio, video, image, text) in order to satisfy transmission constraints and/or user preferences [12][13][14].
This is the end of Part I and I will continue in Part II with the adaptation by transformation approach that utilizes scalable coding formats such as JPEG2000, MPEG-4 BSAC, and MPEG-4 SVC. Thus, stay tuned!

References:
[1] R. Mohan, J. R. Smith, C.-S. Li, “Adapting Multimedia Internet Content for Universal Access,” IEEE Transactions on Multimedia, vol. 1, no. 1, 1999, pp. 104-114.
[2] S.F. Chang and A. Vetro, “Video Adaptation: Concepts, Technologies and Open Issues“, Proceedings of the IEEE, vol. 93, no. 1, Jan. 2005, pp. 148-158.
[3] C-S. Li, R. Mohan and J.R. Smith, “Multimedia Content Description in the InfoPyramid”, Proceedings ICASP’98, Special session on Signal Processing in Modern Multimedia Standards, Seattle, May 1998.
[4] B. Shen, W-T. Tan, F. Huve, “Dynamic Video Transcoding in Mobile Environments“, IEEE Multimedia, vol. 15, no. 1, Jan.-Mar. 2008, pp. 42-51.
[5] A. Vetro, C. Christopoulos and H. Sun, “An overview of video transcoding architectures and techniques“, IEEE Signal Processing Magazine, vol. 20, no. 2, Mar. 2003, pp. 18-29.
[6] J. Xin, C.W. Lin, M.T. Sun, “Digital Video Transcoding“, Proceedings of IEEE, vol. 93, no. 1, Jan. 2005, pp. 84-97.
[7] B. Shen, W-T. Tan, F. Huve, “Dynamic Video Transcoding in Mobile Environments”, IEEE Multimedia, vol. 15, no. 1, Jan.-Mar. 2008, pp. 42-51.
[8] S. Liu, A. Bovik, "Digital Video Transcoding", in A. Bovik, The Essential Guide to Video Processing, Academic Press, 2009.
[9] Fung, et al., “New architecture for dynamic frame skipping transcoder”, IEEE Transactions on Image Processing, vol. 11, no. 8, Aug. 2002, pp. 886-900.
[10] A. Segall, J. Zhao, “Bit-stream rewriting for SVC-to-AVC conversion”, 15th International Conference on Image Processing (ICIP2008), San Diego, USA, Oct. 2008, pp. 2776-2779.
[11] J. De Cock, S. Notebaert, P. Lambert, R. Van de Walle, “Advanced bitstream rewriting from H.264/AVC to SVC”, 15th International Conference on Image Processing (ICIP2008), San Diego, USA, Oct. 2008, pp. 2472-2475.
[12] T. C. Thang, Y. J. Jung, J. W. Lee, Y. M. Ro, “Modality Conversion for Universal Multimedia Services”, Proceeding 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2004), Lisboa, Portugal, April, 2004.
[13] T. C. Thang, Y. J. Jung, and Y. M. Ro, “Modality Conversion for QoS Management in Universal Multimedia Access”, IEE Proceedings: Vision, Image & Signal Processing, vol. 152, no. 3, Jun. 2005, pp.374-384.
[14] M.K. Asadi, J.-C. Dufourd, “Multimedia Adaptation by Transmoding in MPEG-21”, Proceeding 5th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS2004), Lisboa, Portugal, April, 2004.

Friday, December 4, 2009

Workshop on MMT (MPEG Media Transport) – Call for Contributions

MPEG has been developed various technologies for multimedia transport such as MPEG-2 TS, MP4 file format, and so on. Both technologies have been widely accepted and heavily used by the various industries such as digital broadcasting, mobile phones and etc. On the other hands, the standardization organizations such as IETF, IEEE, and 3GPP have been providing various protocols to deliver multimedia contents packetized or packaged by such MPEG transport technologies. For example, several RTP payload formats have been developed to enable the streaming of media from a server to a client over IP networks. However, the development of the payload format was separate from the codec development since, traditionally, the coding and transport of media are associated with different layers of the OSI reference model. Such separation results in the current situation where the optimal streaming solution for MPEG media relies on proprietary information exchange over RTP and its associated protocols.

In order to develop standardized and efficient solutions for the transport of MPEG media, especially given the recent increase demand in the heterogeneous network environment, MPEG is gathering information on current limitations of available standards in the area of media streaming and challenges in emerging network environments.

To overcome existing limitations and face the challenges that emerging applications impose on the requirements of MMT standardization, ISO/IEC WG11 (MPEG) plans to organize a full-day workshop in Kyoto on Wednesday (11a.m~ 5p.m) January 20 during the 91th WG11 meeting in Japan.

The key intention of the workshop is to get overview state-of-art technologies as well as to require solid use cases and requirements for MMT. This will enable MPEG to draw conclusions for the needs and chances in new transport scheme standardization. For this purpose it is planned to invite speakers on key topics and in addition select a variety of proposed contributions. The following topics will be considered:

Industry experience of multimedia transport

  • Delivery of media over IP networks in ptp/ptmp manner
  • Progressive download and streaming
  • MPEG TS Transport between heterogeneous network
  • Delivery and sharing of User created contents
Challenges of emerging multimedia transport environments:
  • Peer-to-Peer (P2P) traffic for IPTV services
  • Cross-layer designs to improve the Quality of Service/Experience (QoS/QoE)
  • Context- and Content-Aware Networks
  • Conversion between a stream and a randomly accessible file
The workshop will be organized by a single track of oral presentations. When planning to propose a contribution, please send a summary by 12 December 2009, including title, author(s), area(s) as from the list above and an abstract of 500 words by email to the following persons (chairmen of MPEG systems and requirements subgroups):
  • Youngkwon Lim, young@netntv.co.kr
  • Jörn Ostermann, ostermann@tnt.uni-hannover.de
The final detail program will be made available by 10 January 2010. Information about acceptance/rejection of the contributions will be conveyed to proponents prior to that date. Note that contributions that cannot be considered for presentation at the workshop will be reviewed during the following week at the MPEG meeting.

Wednesday, December 2, 2009

IEEE Computing Now Dec'10 Theme: Multimedia Metadata and Semantic Management

I've co-edited the December 2010 theme of IEEE Computing Now on Multimedia Metadata and Semantic Management.

Guest Editors' Introduction • Harald Kosch and Christian Timmerer • December 2009

Multimedia semantics is more than developing ontologies to describe the nature of multimedia content. It’s the key research area for interoperable, intelligent access to and management of multimedia materials.

There are many metadata standards. More than 10 organizations vie for leadership in content description, including the Dublin Core Metadata Initiative, ISO/IEC’s MPEG working group, and the World Wide Web Consortium (W3C). For a complete list, see the “Semantic Standards” sidebar.

Recent studies show that this diversity is a major hindrance to a common multimedia semantic understanding. So, the first challenge to address in this area is the heterogeneity in metadata description and query languages. We must build better bridges across semantic gaps. We also need to cleverly aggregate and concisely present results for users while providing security and related access-control techniques appropriate to multimedia content. Other challenges include synchronizing metadata information to media and vice versa and managing this relationship throughout the metadata life cycle.

Effective multimedia management must span the metadata life cycle—from its creation through processing, storage, distribution, and deployment—and work whether the metadata is tightly connected with or independent of the media it describes.

Finally, we need better integration of situational context. This includes not only domain knowledge, but also legal and cultural issues, metadata and semantic quality, compression and encryption techniques.

Combining the Semantic Web with multimedia semantics offers interesting research opportunities for social-information management, such as collaborative multimedia tagging, semantics-aware social-media engineering, and multimedia mash-ups. These opportunities were well represented at the 2009 International Conference on Semantic and Digital Media Technologies (SAMT 09, www.samt2009.org). The Virtual Campfire exemplifies emerging systems for integrating social multimedia. This project, led by Ralf Klamma at the RWTH Aachen (www.dbis.rwth-aachen.de/lehrstruhl/projects/virtualCampfire), establishes an advanced framework to create, search, and share multimedia artifacts with context awareness across communities.

Selected Articles on Multimedia Semantics

This month’s theme includes the following featured articles:

In “Managing and Querying Efficiently Distributed Semantic Multimedia Metadata Collections” (IEEE MultiMedia, Oct.–Dec. 2009, pp. 12–20, special issue on Multimedia Metadata and Semantic Management), Sébastien Laborie, Ana-Maria Manzat, and Florence Sèdes propose an original model of a centralized metadata resume. Their resume is a concise version of the whole metadata, and it can link to some desired multimedia content on remote servers and databases. The authors also propose an automatic construction process for the metadata resume. They demonstrate the framework with current Semantic Web technologies for representing and querying semantic metadata. Their experimental results show the benefits of their approach.

In “Semantic MPEG Query Format Validation and Processing,” also from IEEE MultiMedia’s special issue (Oct.–Dec. 2009, pp. 22–33) Mario Doeller, Ruben Tous, Matthias Gruhne, Miran Choi, Tae-Beom Lim, Jaime Delgado, and Armelle Yakou describe the semantic validation of the MPEG Query Format (MPQF) and the implementation of an MPQF engine on top of an Oracle database management system. MPQF enables interoperable querying among heterogeneous databases that use different metadata standards for describing multimedia content. This article introduces methods for evaluating MPQF semantic-validation rules not expressed by syntactic means within the XML Schema used by the databases. The authors highlight a prototype implementation of an MPQF-capable processing engine using QueryByFreeText, QueryByXQuery, QueryByDescription, and QueryByMedia query types on a set of MPEG-7 based image annotations.

In “Using Social Networking and Collections to Enable Video Semantics Acquisition,” a third article from IEEE MultiMedia’s special issue (Oct.–Dec. 2009, pp. 52–60), Stephen Davis, Ian Burnett, and Christian Ritz consider the multimedia value chain’s first elements: media production, acquisition, and metadata gathering. The authors bring together methods from video content annotation and social networking to solve problems associated with gathering metadata that describes user interactions with and opinions about video content. Then they aggregate individual users’ interaction metadata to form semantic metadata for a given video. The authors have successfully implemented their techniques in a custom Flex application based on the popular Facebook API.

In “The Ariadne Infrastructure for Managing and Storing Metadata” (IEEE Internet Computing, July/Aug. 2009, pp. 18–25) issue of Stefaan Ternier, Katrien Verbert, Gonzalo Parra, Bram Vandeputte, Joris Klerkx, Erik Duval, Vicente Ordóñez, and Xavier Ochoa analyze the standards-based Adriane infrastructure for managing learning objects in an open, scalable architecture. The core infrastructure comprises several components such as the repository, federated search engine, finder, harvester, and metadata validation service. This infrastructure enables the integration of learning objects in multiple, distributed repository networks. Finally, the authors review several architectural patterns that they found useful in searching repositories in this area—namely, federated search, search on harvest, search adapter, and harvest adapter. It would be interesting to see this infrastructure working multimedia metadata.

In “Data-Sharing P2P Networks with Semantic Approximation Capabilities” (IEEE Internet Computing, Sept./Oct. 2009, pp. 60–70), Federica Mandreoli, Riccardo Martoglia, Simona Sassatelli, and Wilma Penzo tackle the new information-retrieval challenges posed by heterogeneous data representations within peer-to-peer systems. The authors suggest leveraging the presence of semantic approximations between peers’ schemas to improve query routing. Their approach identifies the peers that best satisfy a user’s query and ranks the answers through a mechanism that promotes the most semantically relevant results. Their work applies to a scenario in which various actors in a multimedia chain-of-value network (such as network and telecom operators and service providers) must actively collaborate.

In “3D Media and the Semantic Web” (IEEE Intelligent Systems, Mar./Apr. 2009, pp. 90–96), Michela Spagnuolo, and Bianca Falcidieno introduce ways to integrate 3D media with Semantic Web technologies. Tools for coding, extracting, sharing, and retrieving the semantic content of 3D media are still far from satisfactory. The authors describe a means for embedding 3D into the Semantic Web, documenting and annotating 3D media for sharing, understanding its meaning, and retrieving it on the basis of content.

Related Resources

Numerous other articles from a wide range of journals and conferences deal with topics related to Multimedia Semantics; see our accompanying list of recommendations.

We’d also like to know what you think about Multimedia Semantics, so take this month’s poll and voice your opinion.

Harald Kosch is a full professor at the Faculty of Informatics and Mathematics, University of Passau, Germany. His research interests include multimedia metadata, multimedia databases, middleware, and Internet applications. Kosch has a PhD in computer science from Ecole Normale Supérieure de Lyon, France. Contact him at Harald.Kosch@uni-passau.de.


Christian Timmerer is an assistant professor in the Department of Information Technology, Multimedia Communication Group, Klagenfurt University, Austria. His research interests include the transport of multimedia content, multimedia adaptation in constrained and streaming environments, distributed multimedia adaptation, and Quality of Service/Quality of Experience. Timmerer has a PhD in applied informatics from Klagenfurt University. Contact him at christian.timmerer@itec.uni-klu.ac.at. Publications and MPEG contributions can be found under http://research.timmerer.com, follow him on http://www.twitter.com/timse7, and subscribe to his blog http://blog.timmerer.com.