Mobile menu icon
Skip to navigation | Skip to main content | Skip to footer
Mobile menu icon Search iconSearch
Search type

Department of Computer Science


Retrieved Augmented Generation with Data Lakes and Knowledge Graphs

Primary supervisor

Additional supervisors

  • Norman Paton

Contact admissions office

Other projects with the same supervisor

Funding

  • Competition Funded Project (Students Worldwide)

This research project is one of a number of projects at this institution. It is in competition for funding with one or more of these projects. Usually the project which receives the best applicant will be awarded the funding. Applications for this project are welcome from suitably qualified candidates worldwide. Funding may only be available to a limited set of nationalities and you should read the full department and project details for further information.

Project description

Large Language Models (LLMs) such as T5, Llama and GPT-3.5/4 have achieved significant success in AI, particularly in natural language understanding, inference, and generation. However, they are still limited with several key intrinsic issues including hallucination, indecisiveness, shortage of interpretation, failure to support data freshness, and lack of domain specific knowledge. Retrieval Augmented Generation (RAG), which applies LLMs to reasoning or generation over given datasets by retrieving task-relevant data snippets with kinds of techniques such as semantic embedding, graph construction and indexing, has attracted wide attention as it can combine symbolic data and parametric knowledge for addressing the intrinsic issues of LLMs. Meanwhile, Data Lakes which could contain (semi-)structured tabular datasets of different types, unstructured documents, and Knowledge Graphs (including Ontologies) which are a formal method for representing graph structured data and logic equipped domain knowledge, cover the current data and knowledge management of the majority of the applications, domains and organizations. Therefore, investigating RAG with Data Lakes and Knowledge Graphs becomes a promising solution for applying and augmenting LLMs, with a high potential impact on AI and domain applications such as AI assistants, semantic search, AI scientists, regulatory compliance and reporting.

The Information Management Group at the University of Manchester invites applications for PhD candidates in Computer Science and Artificial Intelligence. They will explore how contemporary techniques in Machine Learning, Natural Language Processing, Data Engineering, Knowledge Representation and Reasoning can be used as a foundation to LLM reasoning and generation for tasks such as Question Answering and Fact Checking.
Examples of research challenges include: 1) how to efficiently and precisely retrieve data snippets from a Data Lake and/or Knowledge Graph; 2) how to integrate heterogeneous data snippets and address their knowledge inconsistency for reasoning and generation of LLMs; 3) how to support attribution and human understandable explanation for LLM reasoning and generation using Data Lake and Knowledge Graph techniques.

Applicants are expected to have:

1. An excellent undergraduate degree in Computer Science or Mathematics (or related discipline), and preferably, a relevant M.Sc. degree.
2. Confidence and independence in programming complex systems.
3. Previous academic or industry experience in at least one of the relevant topics such as Machine Learning, Natural Language Processing, Knowledge Representation and Reasoning, Knowledge Engineering, Data Engineering and Data Science.
4. Excellent report writing and presentation skills.

Please note that applicants must additionally satisfy the standard requirements for postgraduate studies at the University of Manchester, such as a first-class or high upper-second class degree (or an equivalent international qualification) and English language qualifications, as stated in the PGR guidelines.

Qualified applicants are strongly encouraged to informally contact Jiaoyan Chen (jiaoyan.chen@manchester.ac.uk) and Norman Paton (norman.paton@manchester.ac.uk) to discuss the application prior to applying.

Person specification

For information

Essential

Applicants will be required to evidence the following skills and qualifications.

  • You must be capable of performing at a very high level.
  • You must have a self-driven interest in uncovering and solving unknown problems and be able to work hard and creatively without constant supervision.

Desirable

Applicants will be required to evidence the following skills and qualifications.

  • You will have good time management.
  • You will possess determination (which is often more important than qualifications) although you'll need a good amount of both.

General

Applicants will be required to address the following.

  • Comment on your transcript/predicted degree marks, outlining both strong and weak points.
  • Discuss your final year Undergraduate project work - and if appropriate your MSc project work.
  • How well does your previous study prepare you for undertaking Postgraduate Research?
  • Why do you believe you are suitable for doing Postgraduate Research?