Fundamental and applied investigations into the potential of provenance metadata in the context of collaborative, open science
Scientific datasets are created, processed, and analyzed using computational processes, in particular workflows. The provenance of such datasets (i.e., their processing history) provides a persistent trace of workflow execution and data lineage information. This forms a corpus of contextual metadata that, in principle, may help users improve their understanding of both the data products and the processes used to generate them. In turn, this would facilitate exchange and collaboration in science, making it more efficient on a global scale.
In practice, however, the benefits of provenance have yet to materialise. Provenance metadata accumulates over time, with many experiments performed by different groups. This presents both technical challenges, in terms of large-scale data management, and opportunities. This project aims to investigate the unfulfilled potential of provenance metadata management, in combination with other e-science technologies.
In particular, the emphasis of the research is on how vast volumes of provenance metadata can be put to use in emerging scenarios where experimental and computational science becomes increasingly pervasive, open, and collaborative. This includes supporting new models of scientific collaboration, supporting personalised information filtering and selection, and more broadly, supporting scientists' decisions in their increasingly data-centric research environment.
The successful candidate will be working within a community of top-class researchers worldwide, will be exposed to cutting-edge information management technologies, and will be expected to contribute to the next generation of e-science research.