Mobile menu icon
Skip to navigation | Skip to main content | Skip to footer
Mobile menu icon Search iconSearch
Search type

Department of Computer Science


Data Wrangling

Primary supervisor

Contact admissions office

Other projects with the same supervisor

Funding

  • Competition Funded Project (European/UK Students Only)

This research project is one of a number of projects at this institution. It is in competition for funding with one or more of these projects. Usually the project which receives the best applicant will be awarded the funding. The funding is available to citizens of a number of European countries (including the UK). In most cases this will include all EU nationals. However full funding may not be available to all applicants and you should read the full department and project details for further information.

Project description

Data wrangling is the process by which the data required by an application is identified, extracted, cleaned and integrated, to yield a data set that is suitable for exploration and analysis. Although there are widely used Extract, Transform and Load (ETL) techniques and platforms, they often require significant manual work from technical and domain experts at different stages of the process. When confronted with the 4 V's of big data (volume, velocity, variety and veracity), manual intervention may make ETL prohibitively expensive.

As a result, we are interested in enabling cost-effective approaches to data wrangling, typically through automation or suggestion. In automation, individual or multiple steps within the data wrangling process are carried out by software, using evidence about what the user requires [1]. In suggestion, given a current situation, the user is informed of possible next steps from which to choose. In both cases, it is necessary to explain the proposed actions to the user, and allow additional information from the user to steer the steps that are followed [2].

While we have recently worked on the development of end-to-end automation for data preparation [1], we are also interested in developing techniques that integrate with the notebook environments that are widely used by data scientists.

[1] Nikolaos Konstantinou, Edward Abel, Luigi Bellomarini, Alex Bogatu, Cristina Civili, Endri Irfanie, Martin Koehler, Lacramioara Mazilu, Emanuel Sallinger, Alvaro A. A. Fernandes, Georg Gottlob, John A. Keane, Norman W. Paton: VADA: an architecture for end user informed data preparation. J. Big Data 6: 74 (2019).

[2] Nikolaos Konstantinou, Norman W. Paton: Feedback driven improvement of data preparation pipelines. Inf. Syst. 92: 101480 (2020).

Person specification

For information

Essential

Applicants will be required to evidence the following skills and qualifications.

  • You must be capable of performing at a very high level.
  • You must have a self-driven interest in uncovering and solving unknown problems and be able to work hard and creatively without constant supervision.

Desirable

Applicants will be required to evidence the following skills and qualifications.

  • You will have good time management.
  • You will possess determination (which is often more important than qualifications) although you'll need a good amount of both.

General

Applicants will be required to address the following.

  • Comment on your transcript/predicted degree marks, outlining both strong and weak points.
  • Discuss your final year Undergraduate project work - and if appropriate your MSc project work.
  • How well does your previous study prepare you for undertaking Postgraduate Research?
  • Why do you believe you are suitable for doing Postgraduate Research?