Application-Aware Video Coding

Application-Aware Video Coding

Application-Aware Video Coding

Conventionally, video encoder is optimised for efficient bandwidth utilisation in video communications, where the distortion due to lossy compression is minimised given the affordable compressed data rate. However, video utilisation has evolved over the past decade, to video content-based industrial applications in other domains such as security and control systems. Similarly, in multimedia applications there is an increasing demand for content-based functionalities for video organisation and flexible access.
In real-time scenarios, these applications can exploit information embedded in the compressed video to fulfil the demand for efficient video content analysis. However, compressed-domain video analysis remains a challenge, because of sparsity and noise in the compressed features. This is due to conventional encoder implementation, limited to optimising compression, which does not necessarily result in content descriptive compressed features. Compression efficiency is critical for optimum use of bandwidth and storage resources. On the other hand, other aspects of video utilisation such as video content-based applications would benefit from enhanced accuracy of content representation in the compressed video stream.
In order to achieve fast and reliable video content analysis, this thesis investigates alter- natives to conventional video encoding that would enhance the accuracy of compressed features, while maintaining compliance with the mainstream video coding standards. A generic Application-Aware Video Coding framework is proposed, which incorporates the accuracy of compressed features in parallel with rate-distortion optimisation criterion.
By considering encoder motion estimation for temporal prediction, the proposed framework was evaluated in three stages. A region-based video encoder optimisation criterion was developed, to identify and encode foreground regions using accurate motion data. The optimisation is steered by a hierarchical motion estimation based on intensity gradients. This was then extended as a motion accuracy constrained rate-distortion optimisation, using spatial and temporal correlation of motion activity in the local neighbourhood, to accommodate multimodal motion.
Finally, an unconstrained optimisation model that combines Rate-Distortion and Motion-Description-Error was developed, leading to fully scalable implementation of the framework. A motion calibrated synthetic data set covering different scene complexities was designed to analyse the framework under known motion content. A mathematical model for Motion-Description-Error was derived as a function of optimisation parameters, scene complexity and encoder configuration. It is demonstrated that the proposed optimisation framework can reduce the extent of noise in estimated motion by 50%- 60%, without compromising on rate distortion performance or encoder complexity.