A quick way to learn a mixture of exponentially many linear models
- Speaker: Professor Geoff Hinton, FRS (University of Toronto)
- Host: Neil Lawrence
- 29th June 2009 at 14:00 in Lecture Theatre 1.4, Kilburn Building
Mixtures of linear models can be used to model data that lies on or near a smooth non-linear manifold. A proper Bayesian treatment can be applied to toy data to determine the number of models in the mixture and the dimensionality of each linear model but this neurally uninspired approach completely misses the main problem: Real data with many degrees of freedom in the manifold requires a mixture with an exponential number of components. It is quite easy to fit mixtures of 21000 linear models by using a few tricks: First, each linear model selects from a pool of shared factors using the selection rule that factors with negative values are ignored. Second, undirected linear models are used to simplify inference and the models are trained by matching pairwise statistics. Third, Poisson noise is used to implement L1 regularization of the activities of the factors. The factors are then threshold linear neurons with Poisson noise and their positive integer activities are very sparse. Preliminary results suggest that these exponentially large mixtures work very well as modules for greedy, layer-by-layer learning of deep networks. Even with one eye closed, they outperform Support Vector machines for recognizing 3-D images of objects from the NORB database.