Biological database quality through the lens of the scientific literature
- Speaker: Prof Karin Verspoor (The University of Melbourne)
- Host: Sophia Ananiadou
- 27th April 2017 at 14:00 in Kilburn L.T 1.5
Biological databases, such as gene and protein-related databases, are vast resources that play a critical role in day-to-day interpretation and analysis in biological research. Due to their size, however, it is very difficult to ensure that errors and inconsistencies do not creep in. In this talk, I will describe a series of studies that consider errors in biological sequence databases, focusing on record duplication and erroneous records. I will introduce methods based on both machine learning and information retrieval for identifying such records. The methods draw on internal database attributes, as well as consideration of external record-associated information specifically, the scientific literature associated with the biological entities represented in the records to enable effective error-checking of biological database records. The research points to the value of the literature as a resource for biological database quality control.