Semantic enrichment of publications, references and datasets to facilitate knowledge discovery
- Speaker: Dr David Shotton (University of Oxford)
- Host: Stephen Pettifer
- 8th December 2010 at 14:15 in Lecture Theatre 1.4, Kilburn Building
Open publication of bibliographic and citation data relating to scientific publications is essential if we in academia are to reclaim our scholarship from the stranglehold of commercial publishers, to whom we have surrendered such information for the past half century. Publication of such data as Open Linked Data on the Semantic Web, which would enable the development of services for knowledge discovery from citation networks, requires that the relevant entities and relationships are accurately described using appropriate ontology classes and properties. To date, important work has been undertaken using domain-specific ontologies and knowledge management systems (GO, myExperiment, BioMoby, Bio2RDF, etc.) to describe biological systems and molecular interactions, and to make these data machine-readable and interoperable. However, little attention has been paid to the realm of publishing itself. Using as a starting point the semantically enhanced version of Reis et al. (2008) Impact of environment and social gradient on Leptospira infection in urban slums PLoS Neglected Tropical Diseases 2: e228. http://dx.doi.org/10.1371/journal.pntd.0000228.x001 that we created in 2008, I will describe the need for ontologies to serve the publishing aspects of the practice of science, and will introduce the SPAR (Semantic Publishing and Referencing) Ontologies (http://bit.ly/9d8qAi), a suite of eight orthogonal and complementary ontologies developed for this purpose. I will then exemplified the utility of the SPAR Ontologies for describing various activities and entities across this domain - for example, typing the nature of citations, creating citation networks, encoding the metadata for citation contexts and citation counts, describing bibliographic objects, organizing reference collections, annotating document components, and describing publishing roles, status and workflows - and how this might be important in real-world applications. Finally, I will describe recently funded projects to publish scientific bibliographic and citation data as Open Linked Data (http://bit.ly/f356Bc and http://bit.ly/gn6iqo), and to publish the datasets that underpin peer-reviewed research articles in Dryad, a domain-specific repository for biological datasets (http://datadryad.org/dryaduk).