Data Fusion (of Everything)
- Speaker: Prof Blaz Zupan (University of Ljubljana)
- Host: Goran Nenadic
- 10th February 2016 at 14:00 in KB L.T. 1.4
Have you ever been overwhelmed by data --- not only by their volume but also by their sheer multitude? In many fields, including life sciences, the data abounds. One of the grand challenges of machine learning is to infer a predictive model by jointly considering all the available data sets. In bioinformatics, this would be integrating data sets as diverse as, say, gene expression, interactions, functional annotations, phenotype information, various ontologies, disease markers, structural properties of chemicals, Facebook and Twitter (ok, I perhaps went too far with the last two items). That is, the learning algorithm would need to consider all available information, even if only circumstantially related to the problem at hand. At University of Ljubljana we have developed a computational approach that uses collective matrix tri-factorization and can consider such diverse data sets. Tri-factorization infers a joint latent data model. The model can be used for various data mining tasks, such as class prediction and ranking. Our experiments show that through a broad integration of heterogeneous data sets we can substantially increase the accuracy. In the talk, I will present the intuition behind data fusion by matrix tri-factorization, and show its application in several recent studies, including in finding of new bacterial response genes in social amoeba Dictyostelium.