Application-Aware Video Coding

Application-Aware Video Coding

Application-Aware Video Coding

Conventionally, video encoder is optimised for efficient bandwidth utilisation in video communications, where the distortion due to lossy compression is minimised given the affordable compressed data rate. However, video utilisation has evolved over the past decade, to video content-based industrial applications in other domains such as secu- rity and control systems. Similarly, in multimedia applications there is an increasing demand for content-based functionalities for video organisation and flexible access.
In real-time scenarios, these applications can exploit information embedded in the compressed video to fulfil the demand for efficient video content analysis. However, compressed-domain video analysis remains a challenge, because of sparsity and noise in the compressed features. This is due to conventional encoder implementation, lim- ited to optimising compression, which does not necessarily result in content descriptive compressed features. Compression efficiency is critical for optimum use of bandwidth and storage resources. On the other hand, other aspects of video utilisation such as video content-based applications would benefit from enhanced accuracy of content rep- resentation in the compressed video stream.
In order to achieve fast and reliable video content analysis, this thesis investigates alter- natives to conventional video encoding that would enhance the accuracy of compressed features, while maintaining compliance with the mainstream video coding standards. A generic Application-Aware Video Coding framework is proposed, which incorporates the accuracy of compressed features in parallel with rate-distortion optimisation criterion.
By considering encoder motion estimation for temporal prediction, the proposed frame- work was evaluated in three stages. A region-based video encoder optimisation criterion was developed, to identify and encode foreground regions using accurate motion data. The optimisation is steered by a hierarchical motion estimation based on intensity- gradients. This was then extended as a motion accuracy constrained rate-distortion optimisation, using spatial and temporal correlation of motion activity in the local neighbourhood, to accommodate multimodal motion.
Finally, an unconstrained optimisation model that combines Rate-Distortion and Motion- Description-Error was developed, leading to fully scalable implementation of the frame- work. A motion calibrated synthetic data set covering different scene complexities was designed to analyse the framework under known motion content. A mathematical model for Motion-Description-Error was derived as a function of optimisation parame- ters, scene complexity and encoder configuration. It is demonstrated that the proposed optimisation framework can reduce the extent of noise in estimated motion by 50%- 60%, without compromising on rate distortion performance or encoder complexity.

Compressed domain video object tracking

We proposed a new approach for the fast compressed domain analysis utilising motion data from the encoded bit-streams in order to achieve low-processing complexity of object tracking in the surveillance videos. The algorithm estimates the trajectory of video objects by using compressed domain motion vectors extracted directly from standard H.264/MPEG-4 Advanced Video Coding (AVC) and Scalable Video Coding (SVC) bit-streams. The experimental results show comparable tracking precision when evaluated against the standard algorithms in uncompressed domain, while maintaining low computational complexity and fast processing time, thus making the algorithm suitable for real time and streaming applications where good estimates of object trajectories have to be computed fast.


Fast analysis of scalable video for adaptive browsing interfaces

This work introduces a framework for video summarisation and browsing by utilising inherently hierarchical compressed-domain features of scalable video and efficient dynamic video summarisation. This approach enables instant adaptability of generated video summaries to available channel bandwidth as well as display resources. By utilising compressed domain features an efficient hierarchical analysis of motion activity at different layers of complexity is achieved. Exploiting a contour evolution algorithm, a scale space of temporal video descriptors is generated, enabling rapid video summarisation. Given the spatial resources of the terminal display and generated video summary, the final browsing layout is generated utilising an unsupervised robust spectral clustering technique and a fast discrete optimisation algorithm. Results show excellent scalability of the video summaries and good algorithm efficiency.