Data Wrangling
Primary supervisor
Contact admissions office
Other projects with the same supervisor
- Retrieved Augmented Generation with Data Lakes and Knowledge Graphs
- Data Integration & Exploration on Data Lakes
- Finding a way through the Fog from the Edge to the Cloud
- Data Lake Exploration with Modern Artificial Intelligence Techniques
- Fishing in the Data Lake
Funding
- Competition Funded Project (European/UK Students Only)
This research project is one of a number of projects at this institution. It is in competition for funding with one or more of these projects. Usually the project which receives the best applicant will be awarded the funding. The funding is available to citizens of a number of European countries (including the UK). In most cases this will include all EU nationals. However full funding may not be available to all applicants and you should read the full department and project details for further information.
Project description
Data wrangling is the process by which the data required by an application is identified, extracted, cleaned and integrated, to yield a data set that is suitable for exploration and analysis. Although there are widely used Extract, Transform and Load (ETL) techniques and platforms, they often require significant manual work from technical and domain experts at different stages of the process. When confronted with the 4 V's of big data (volume, velocity, variety and veracity), manual intervention may make ETL prohibitively expensive.
As a result, we are interested in enabling cost-effective approaches to data wrangling, typically through automation or suggestion. In automation, individual or multiple steps within the data wrangling process are carried out by software, using evidence about what the user requires [1]. In suggestion, given a current situation, the user is informed of possible next steps from which to choose. In both cases, it is necessary to explain the proposed actions to the user, and allow additional information from the user to steer the steps that are followed [2].
While we have recently worked on the development of end-to-end automation for data preparation [1], we are also interested in developing techniques that integrate with the notebook environments that are widely used by data scientists.
[1] Nikolaos Konstantinou, Edward Abel, Luigi Bellomarini, Alex Bogatu, Cristina Civili, Endri Irfanie, Martin Koehler, Lacramioara Mazilu, Emanuel Sallinger, Alvaro A. A. Fernandes, Georg Gottlob, John A. Keane, Norman W. Paton: VADA: an architecture for end user informed data preparation. J. Big Data 6: 74 (2019).
[2] Nikolaos Konstantinou, Norman W. Paton: Feedback driven improvement of data preparation pipelines. Inf. Syst. 92: 101480 (2020).
Person specification
For information
- Candidates must hold a minimum of an upper Second Class UK Honours degree or international equivalent in a relevant science or engineering discipline.
- Candidates must meet the School's minimum English Language requirement.
- Candidates will be expected to comply with the University's policies and practices of equality, diversity and inclusion.
Essential
Applicants will be required to evidence the following skills and qualifications.
- You must be capable of performing at a very high level.
- You must have a self-driven interest in uncovering and solving unknown problems and be able to work hard and creatively without constant supervision.
Desirable
Applicants will be required to evidence the following skills and qualifications.
- You will have good time management.
- You will possess determination (which is often more important than qualifications) although you'll need a good amount of both.
General
Applicants will be required to address the following.
- Comment on your transcript/predicted degree marks, outlining both strong and weak points.
- Discuss your final year Undergraduate project work - and if appropriate your MSc project work.
- How well does your previous study prepare you for undertaking Postgraduate Research?
- Why do you believe you are suitable for doing Postgraduate Research?