This research comprises a series of user experience studies in Human Computer Interaction (HCI) that: i) analyse user aspects of stereoscopic 3D video interaction, ii) propose technical solutions and iii) give design guidelines for intuitive interaction with stereoscopic 3D video content.
One of the main emerging challenges of future multimedia platforms is the development of three-dimensional (3D) display technology, resulting in a plethora of research activities in the video research community focusing on this topic. This emerging technology is capable of bringing a whole new experience to the end user by offering a 3D real immersive feeling experience. However, research towards meaningful user interaction with the real 3D content is still at its early stages.
Having this in mind, the main aim of this research activity is to provide a comprehensive understanding and investigation about how to develop an interactive 3D video platform that delivers intuitive interaction with 3D video content. The key elements of the proposed platform include effective interaction with the content and the design of appropriate UI modality. Moreover, in order to specify the requirement for the designs, a number of studies into the implication of the 3D content delivery mechanism as well as the best user practices are being conducted.
Intuitive interfaces have become increasingly important multimedia applications, from personal photo collection to professional management systems. This research brings a novel intuitive interactive interface for browsing of large image and video collections that visualizes underlying structure of the dataset by its size and spatial relations. In order to achieve this, images/frames are initially clustered using an unsupervised graph-based clustering algorithm. By selecting images in a hierarchical layout of the screen, user can intuitively navigate through the collection. The experimental results demonstrate a significant speed-up in a content search scenario compared to a standard browsing interface, as well as inherent intuitiveness of the system.
Our user evaluation of FreeEye browsing interface was published online as a pre-print, and can be found at:
Conventionally, video encoder is optimised for efficient bandwidth utilisation in video communications, where the distortion due to lossy compression is minimised given the affordable compressed data rate. However, video utilisation has evolved over the past decade, to video content-based industrial applications in other domains such as secu- rity and control systems. Similarly, in multimedia applications there is an increasing demand for content-based functionalities for video organisation and flexible access.
In real-time scenarios, these applications can exploit information embedded in the compressed video to fulfil the demand for efficient video content analysis. However, compressed-domain video analysis remains a challenge, because of sparsity and noise in the compressed features. This is due to conventional encoder implementation, lim- ited to optimising compression, which does not necessarily result in content descriptive compressed features. Compression efficiency is critical for optimum use of bandwidth and storage resources. On the other hand, other aspects of video utilisation such as video content-based applications would benefit from enhanced accuracy of content rep- resentation in the compressed video stream.
In order to achieve fast and reliable video content analysis, this thesis investigates alter- natives to conventional video encoding that would enhance the accuracy of compressed features, while maintaining compliance with the mainstream video coding standards. A generic Application-Aware Video Coding framework is proposed, which incorporates the accuracy of compressed features in parallel with rate-distortion optimisation criterion.
By considering encoder motion estimation for temporal prediction, the proposed frame- work was evaluated in three stages. A region-based video encoder optimisation criterion was developed, to identify and encode foreground regions using accurate motion data. The optimisation is steered by a hierarchical motion estimation based on intensity- gradients. This was then extended as a motion accuracy constrained rate-distortion optimisation, using spatial and temporal correlation of motion activity in the local neighbourhood, to accommodate multimodal motion.
Finally, an unconstrained optimisation model that combines Rate-Distortion and Motion- Description-Error was developed, leading to fully scalable implementation of the frame- work. A motion calibrated synthetic data set covering different scene complexities was designed to analyse the framework under known motion content. A mathematical model for Motion-Description-Error was derived as a function of optimisation parame- ters, scene complexity and encoder configuration. It is demonstrated that the proposed optimisation framework can reduce the extent of noise in estimated motion by 50%- 60%, without compromising on rate distortion performance or encoder complexity.
We proposed a new approach for the fast compressed domain analysis utilising motion data from the encoded bit-streams in order to achieve low-processing complexity of object tracking in the surveillance videos. The algorithm estimates the trajectory of video objects by using compressed domain motion vectors extracted directly from standard H.264/MPEG-4 Advanced Video Coding (AVC) and Scalable Video Coding (SVC) bit-streams. The experimental results show comparable tracking precision when evaluated against the standard algorithms in uncompressed domain, while maintaining low computational complexity and fast processing time, thus making the algorithm suitable for real time and streaming applications where good estimates of object trajectories have to be computed fast.
This work introduces a framework for video summarisation and browsing by utilising inherently hierarchical compressed-domain features of scalable video and efficient dynamic video summarisation. This approach enables instant adaptability of generated video summaries to available channel bandwidth as well as display resources. By utilising compressed domain features an efficient hierarchical analysis of motion activity at different layers of complexity is achieved. Exploiting a contour evolution algorithm, a scale space of temporal video descriptors is generated, enabling rapid video summarisation. Given the spatial resources of the terminal display and generated video summary, the final browsing layout is generated utilising an unsupervised robust spectral clustering technique and a fast discrete optimisation algorithm. Results show excellent scalability of the video summaries and good algorithm efficiency.