Neural Network Compression for Multimedia Applications – WG11 (MPEG) progresses to Committee Draft
Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires a compressed representation of neural networks.
WG11 (MPEG) has completed the CD of the specification at its 131st meeting. Considering the fact that the compression of neural networks is likely to have a hardware-dependent and hardware-independent component, the standard is designed as a toolbox of compression technologies. The specification contains different parameter sparsification, parameter reduction (e.g., matrix decomposition), parameter quantization, and entropy coding methods, that can be assembled to encoding pipelines combining one or more (in the case of sparsification/reduction) methods from each group. The results show that trained neural networks for many common multimedia problems such as image or audio classification or image compression can be compressed to 10% of their original size with no or very small performance loss, and even significantly more at small performance loss. The specification is independent of a particular neural network exchange format, and interoperability with common formats is described in the annexes.