Mobile menu icon
Skip to navigation | Skip to main content | Skip to footer
Mobile menu icon Search iconSearch
Search type

Department of Computer Science

Data Integration & Exploration on Data Lakes

Primary supervisor

Additional supervisors

  • Andre Freitas

Contact admissions office

Other projects with the same supervisor


  • Self-Funded Students Only

If you have the correct qualifications and access to your own funding, either from your home country or your own finances, your application to work with this supervisor will be considered.

Project description

Data Lakes are emerging as data management infrastructures for storing data in various schemata and structural forms. Their goal is to serve as a single entry point for the data analysis process across highly heterogeneous datasets, supporting analytical tasks following a schema-on-read approach, in which data is discovered and integrated when it is to be used. Due to their semantic and structural heterogeneity, Data Lakes bring integration challenges to a new scale of complexity.

The Information Management Group at the University of Manchester invites applications for PhD candidates in the area of data integration and exploration on Data Lakes. PhD projects in this area will explore how contemporary techniques in Natural Language Processing (such as Open Information Extraction, Distributional Semantics and Semantic Parsing) can be used as a foundation to support exploratory data analysis on real-world data lakes.

Examples of research challenges include:

How to scale the integration of unstructured, semi-structured and structured datasets.
How to support end-users in exploratory data analysis (using Natural Language Questions for example).
How to use information embedded in large-scale corpora to support data integration.
How to use contemporary techniques in one-shot machine learning to support data integration.

Applicants are expected to have:

An excellent undergraduate degree in Computer Science or Mathematics (or related discipline), and preferably, a relevant M.Sc. degree.
Confidence and independence in programming complex systems in Java or Python.
Previous academic or industry experience in Natural Language Processing or Data Science (desired).
Excellent report writing and presentation skills.

Please note that applicants must additionally satisfy the standard requirements for postgraduate studies at the University of Manchester, such as a first-class or high upper-second class (or an equivalent international qualification) and English language qualifications, as stated in the PGR guidelines.

Qualified applicants are strongly encouraged to informally contact Norman Paton ( and Andre Freitas ( to discuss the application prior to applying.

Person specification

For information


Applicants will be required to evidence the following skills and qualifications.

  • You must be capable of performing at a very high level.
  • You must have a self-driven interest in uncovering and solving unknown problems and be able to work hard and creatively without constant supervision.


Applicants will be required to evidence the following skills and qualifications.

  • You will have good time management.
  • You will possess determination (which is often more important than qualifications) although you'll need a good amount of both.


Applicants will be required to address the following.

  • Comment on your transcript/predicted degree marks, outlining both strong and weak points.
  • Discuss your final year Undergraduate project work - and if appropriate your MSc project work.
  • How well does your previous study prepare you for undertaking Postgraduate Research?
  • Why do you believe you are suitable for doing Postgraduate Research?