Contextualised Multimedia Information Retrieval via Representation Learning
Multimedia Information Retrieval (MIR) is an important research area in AI that aims at extracting semantic information from multimedia data sources including perceivable media such as audio, image and video, indirectly perceivable sources such as text, bio-signals as well as not perceivable sources such as bio-information, stock prices, etc. In general, the main MIR tasks of MMIR can be summarisation of media content as a concise description via feature extraction, filtering of media descriptions via elimination of redundancy, and categorization of media descriptions into classes to facilitate retrieval. In essence, the fundamental problem underlying all the MIR tasks is how to bridge the gap between low-level multimedia data and the semantics conveyed by such data. On the other hand, the accurate semantics are not able to be decided until the context is given as perfectly exemplified in natural language understanding. In general, we believe that the contextual information would be extremely useful in MIR if such information can be captured/modelled.
Unlike many existing researches in MIR, this project is going to investigate how to explore and exploit context information from multimedia annotation and side information sources to facilitate different MIR tasks. While there are other approaches to this problem, this project focuses on exploring the synergy between contextualised semantic representations and low-level descriptors to bridge the aforementioned gap via machine learning. The main issues to be investigated include novel multimedia feature extraction methods suitable for contextualised MIR, contextualised semantics modelling, effective media data descriptors and their joint latent representations. In particular, the aforementioned research issues would be investigated by taking real environmental factors, e.g., noise and mismatch conditions, into account. Based on the proposed approaches, a prototype of high performance for a target application would be established, e.g., personalised MIR for video stream retrieval. While the relevant fundamental research is expected to be conducted, the project is suitable for one who has a clear targeted application area in mind.
In order to take this project, it is essential to have good machine learning and multimedia (signal processing) background knowledge as well as excellent programming skills.
If you are interested in this project, please first visit my research student page: http://staff.cs.manchester.ac.uk/~kechen/ for the required materials and information prior to contacting me.