Increasing throughput rates and technical developments in video streaming over the Internet offer an attractive solution for the distribution of immersive 3D multi-view. Nevertheless, robustness of video streaming is subject to its utilisation of efficient error resiliency and content aware adaptation techniques. Dynamic network characteristics resulting in frequent congestions may prevent video packets from being delivered in a timely manner. Packet delivery failures may become prominent, degrading 3D immersive video experience significantly. In order to overcome this problem, a novel view recovery technique for 3D free-viewpoint video is introduced to maintain 3D video quality in a cost-effective manner. In this concept, the undelivered (discarded) views as a result of adaptation in the network are recovered with high quality at the receiver side, using Side Information (SI) and the delivered frames of neighbouring views. The proposed adaptive 3D multi-view video streaming scheme is tested using Dynamic Adaptive Streaming over HTTP (MPEG-DASH) standard. Tests using the proposed adaptive technique have revealed that the perceptual 3D video quality under adverse network conditions is significantly improved thanks to the utilisation of the extra side information in view recovery.
ACTION-TV proposes an innovative mode of user interaction for broadcasting to relax the rigid and passive nature of present broadcasting ecosystems. It has two key aims:
– A group of users can take part in TV shows providing a sense of immersion into the show and seamless engagement with the content;
– Users are encouraged to use TV shows as a mean of social engagement as well as keeping them and their talents more visible across social circles.
These aims will be achieved by developing an advanced digital media access and delivery platform that enables augmenting traditional audio-visual broadcastings with novel interactivity elements to encourage natural engagement with the content. Mixed-reality technologies will be developed to insert users into pre-recorded content, which will be made ‘responsive’ to users’ actions by ingeniously using a set of auxiliary streams. Potentials of media cloud technologies will be harnessed to personalise ACTION-TV-enabled broadcast content for a group of collaborating users based on their actions. As a result, content producers, for the first time, will be able to generate creative media applications with richer content level user interactivity. Cloud-service providers will be able to monetise their infrastructure through leveraging the increased demand for strategically located in-network media processing. Participating users will be able to share personalised content with their social peers. In this way, end users will have access to more engaging personalised content as well as socialise themselves with community members having common interests. ACTION-TV supports a range of applications from an individual trying out a garment in a TV advert to a group of users interactively attending a TV talent show with the convenience of staying at home. However, ways of utilising the proposed interactivity concept are endless and only limited by the imagination of inspiring content producers.
Conventionally, video encoder is optimised for efficient bandwidth utilisation in video communications, where the distortion due to lossy compression is minimised given the affordable compressed data rate. However, video utilisation has evolved over the past decade, to video content-based industrial applications in other domains such as secu- rity and control systems. Similarly, in multimedia applications there is an increasing demand for content-based functionalities for video organisation and flexible access.
In real-time scenarios, these applications can exploit information embedded in the compressed video to fulfil the demand for efficient video content analysis. However, compressed-domain video analysis remains a challenge, because of sparsity and noise in the compressed features. This is due to conventional encoder implementation, lim- ited to optimising compression, which does not necessarily result in content descriptive compressed features. Compression efficiency is critical for optimum use of bandwidth and storage resources. On the other hand, other aspects of video utilisation such as video content-based applications would benefit from enhanced accuracy of content rep- resentation in the compressed video stream.
In order to achieve fast and reliable video content analysis, this thesis investigates alter- natives to conventional video encoding that would enhance the accuracy of compressed features, while maintaining compliance with the mainstream video coding standards. A generic Application-Aware Video Coding framework is proposed, which incorporates the accuracy of compressed features in parallel with rate-distortion optimisation criterion.
By considering encoder motion estimation for temporal prediction, the proposed frame- work was evaluated in three stages. A region-based video encoder optimisation criterion was developed, to identify and encode foreground regions using accurate motion data. The optimisation is steered by a hierarchical motion estimation based on intensity- gradients. This was then extended as a motion accuracy constrained rate-distortion optimisation, using spatial and temporal correlation of motion activity in the local neighbourhood, to accommodate multimodal motion.
Finally, an unconstrained optimisation model that combines Rate-Distortion and Motion- Description-Error was developed, leading to fully scalable implementation of the frame- work. A motion calibrated synthetic data set covering different scene complexities was designed to analyse the framework under known motion content. A mathematical model for Motion-Description-Error was derived as a function of optimisation parame- ters, scene complexity and encoder configuration. It is demonstrated that the proposed optimisation framework can reduce the extent of noise in estimated motion by 50%- 60%, without compromising on rate distortion performance or encoder complexity.
We proposed a new approach for the fast compressed domain analysis utilising motion data from the encoded bit-streams in order to achieve low-processing complexity of object tracking in the surveillance videos. The algorithm estimates the trajectory of video objects by using compressed domain motion vectors extracted directly from standard H.264/MPEG-4 Advanced Video Coding (AVC) and Scalable Video Coding (SVC) bit-streams. The experimental results show comparable tracking precision when evaluated against the standard algorithms in uncompressed domain, while maintaining low computational complexity and fast processing time, thus making the algorithm suitable for real time and streaming applications where good estimates of object trajectories have to be computed fast.
This work introduces a framework for video summarisation and browsing by utilising inherently hierarchical compressed-domain features of scalable video and efficient dynamic video summarisation. This approach enables instant adaptability of generated video summaries to available channel bandwidth as well as display resources. By utilising compressed domain features an efficient hierarchical analysis of motion activity at different layers of complexity is achieved. Exploiting a contour evolution algorithm, a scale space of temporal video descriptors is generated, enabling rapid video summarisation. Given the spatial resources of the terminal display and generated video summary, the final browsing layout is generated utilising an unsupervised robust spectral clustering technique and a fast discrete optimisation algorithm. Results show excellent scalability of the video summaries and good algorithm efficiency.
This project developed a real-time algorithm for scene change detection that analyses the statistics of the macroblock features extracted directly from the MPEG-2 stream. A method for extraction of the continuous frame difference that transforms the 3D video stream into a 1D curve is presented. This transform is then further employed to extract temporal units within the analysed video sequence.