The broad objective of dataspaces is that structured data should be able to be made available in an integrated way, with minimal effort directed at the development and maintenance of the mappings that are central to classical integration.
Our adaptation to the emphasis on automation and reduced cost is to seek to support incremental refinement of automated mappings, using information from different sources (e.g. the users, the data in the different sources). The aim of our project is therefore to investigate how a dataspace management system (DSMS) can provide incremental integration of heterogeneous sources.
We assume that for incrementality to be effective, a DSMS should:
- provide different qualities of data integration at different costs,
- indicate to users the likely quality or at least the origin of query answers,
- allow users to influence the behaviour of a dataspace by stating their non-functional requirements, providing feedback on the quality of answers, and supplying sample answers that would meet their expectations,
- enable users to share or to personalise their usage of the dataspace based on their preferences or distinctive requirements.
The ensuing objectives are to improve understanding of dataspaces by designing, evaluating and revising techniques that enable incremental, user-directed data integration. In particular, we propose:
- To design a software framework that supports the flexible development of dataspace management systems through the replacement of key components, including schema mapping and result ranking algorithms.
- To investigate the annotation of schema mappings with measures of their likely quality from a range of sources.
- To explore how lineage information, combined with indications as to likely quality, can be conveyed to users, and in turn to identify how feedback from users on the quality of results can be reflected in annotations.
- To investigate how the quality of query answers and mappings can be improved given explicit user direction.
- To explore how annotations and user preferences can be used in the ranking of query results.
The project is supported by EPSRC.
Hedeler, C., Belhajjame, K., Paton, N. W., Fernandes, A. A. A., Embury, S. M., Mao, L., Guo, C., Pay-As-You-Go Mapping Selection in Dataspaces, Proc. of the ACM SIGMOD International Conference on Management of Data, (SIGMOD), ACM, 2011. (Demo paper)
Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M., User Feedback as a First Class Citizen in Information Integration Systems, Proc. 5th Biennial Conference on Innovative Data Systems Research (CIDR), 175-183, 2011 (pdf).
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C., Feedback-based annotation, selection and refinement of schema mappings for dataspaces. Proc. 13th International Conference on Extending Database Technology (EDBT) , 573-584, ACM, 2010 (pdf).
Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A. and Embury, S.M., Dataspaces, Search Computing: Challenges and Directions, S. Ceri, M. Brambilla (eds), Springer, 114-134, 2010.
Hedeler, C., Belhajjame, K., Mao, L., Paton, N.W., Fernandes, A.A.A., Guo, C., Embury, S.M., Flexible Dataspace Management Through Model Management, Proc. of the 2010 EDBT/ICDT Workshops, ACM Conference Proceeding Series; Vol. 426, 2010. (pdf)
Hedeler, C., Belhajjame, K., Paton, N.W., Fernandes, A.A.A. and Embury, S.M., Dimensions of Dataspaces, Proc. 26th British National Conference on Databases, Springer-Verlag, 55-66, 2009 (pdf).
Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings, Proc. 21st Intl. Conference on Advanced Information Systems (CAiSE), Springer-Verlag, 79-93, 2009 (pdf).