Activities and projects

The Manchester MLO Group conducts world-leading research into a wide range of techniques and applications of machine learning, optimization, data mining, probabilistic modelling, pattern recognition and machine perception. The group spans the field from new theoretical developments to large applications, and is currently supported by a number of research bodies, including EPSRC, BBSRC, and several industry partners.

We promote an active seminar culture (see our seminars page), maintain a Resources for Research collection, and are involved in teaching at undergraduate and MSc level.

Our research covers these areas:
  • Probabilistic models
  • Unsupervised learning
  • Optimisation algorithms
  • Feature selection
  • Speech recognition
  • Reinforcement learning
  • Ensemble methods
  • Evolutionary algorithms
  • Deep neural nets
  • Dynamical systems
  • Boosting
  • Online learning
  • Semi-supervised learning
  • Neural networks
  • Concept drift
  • Bayesian methods


We are currently involved in the teaching of the following modules:


The following is a partial list of past/present MLO group research projects. For more information, for example if you are wishing to apply to study for a PhD in these areas, please contact the specified project lead.

  • Constrained MultiObjective Optimisation

    MLO ContactJoshua Knowles and Richard Allmendinger

    Our research is on optimization problems in which candidate solutions are evaluated by conducting physical or biochemical experiments. Such optimization processes may be subject to resourcing issues, as any experiment may require resources in order to be conduced. The primary issue we study is scenarios where resources required to conduct certain experiments are not consistently available throughout the optimization process - we model the dynamic availability of resources using what we call ephemeral resource constraints. The second resourcing issue is related to optimizing subject to changes of variables; here, as an example, consider the optimization of combinations of drugs drawn from a non-stationary library. The final resourcing issue is related to optimization in lethal environments. The aim here is to evolve a population of hardware entities, such as automous robot capsules, nano-machine, or drone planes, which can be accidentally destroyed if wrong software (the EA solution) is uploaded on them. Hence, the size of the population is at risk if a too aggressive search is used. Our objective is to understand how these resourcing issues affect evolutionary search, and to develop effective and efficient search strategies for dealing with them.

  • Computational Modelling of Biochemical Networks

    MLO Contact: Pedro Mendes

    Computational modeling and simulation of biochemical networks is at the core of systems biology and this includes many types of analyses that can aid understanding of how these systems work. COPASI is a generic software package for modeling and simulation of biochemical networks which provides many of these analyses in convenient ways that do not require the user to program or to have deep knowledge of the numerical algorithms. COPASI is a flexible framework capable of: steady-state and time-course simulations, stoichiometric analyses, parameter scanning, sensitivity analysis (including metabolic control analysis), global optimization, parameter estimation, and stochastic simulation.

    • Mendes P, Hoops S, Sahle S, Gauges R, Dada J, Kummer U. Computational modeling of biochemical networks using COPASI. Methods in molecular biology (Clifton, N.J.). 2009; 500: 17-59. [PDF]
  • Machine Learning for Adaptive Multi-Core Machines

    MLO Contact: Gavin Brown - (or visit the project website)

    The computer industry is undergoing the "multi-core" revolution. When you buy a PC off the shelf these days, it is inevitably "dual-core" or "quad-core". This idea of more and more CPU "cores" executing in parallel is expected to continue to the hundreds and thousands. The problem of coordinating these cores is challenging and unsolved. The iTLS project applies Machine Learning technologies to address this problem.

  • Dynamical Systems Analysis of Non-Stationary Learning

    MLO Contact: Jon Shapiro and Joe Mellor

    The process in which a learning agent receives the data from which to learn one example at a time is called online learning. Online learning is most useful in two main contexts; in scenarios where the dataset from which to learn is so vast that attempting to consider all data points in the dataset becomes intractable, and in situations where the agent recieves data as a stream in real time and so must learn at the same time as responding to the environment. In both situations, the distribution from which the training data comes from can change over time, this is called a non-stationary environment. The changing environment and the learning algorithm can be viewed as dynamical systems -- our research is pursuing this in the context of Iterated Function systems. The overall hypothesis of the project is that objects of study in the theory of Dynamical systems can be used to analyse online learning algorithms in non-stationary environments. This could lead to a better understanding of convergence properties of algorithms in certain environments and potentially allow for better design of these online algorithms.

  • Guaranteed Approximation and Convergence in Multiobjective Optimization

    MLO Contact : Joshua Knowles

    Approximation algorithms, which are methods that deliver solutions guaranteed in the worst case to be no more than a fixed amount epsilon away from optimal, are well-known in single-objective optimization. In multiobjective optimization, the concept of approximation must be extended in two ways: to cover vector fitness values; and, to cover sets. Thanks to work by Yannakakis and Papadimitriou[1], and by Laumanns et al[2], we have the notions of an epsilon-approximation set and an epsilon-Pareto approximation set, which are alternative types of approximation to a Pareto optimal set. In ongoing work (that dates back to some of my PhD[3] studies in 1999 onwards), I am investigating what types of algorithm give guaranteed approximation to a Pareto front, and with what type of convergence. I am particularly interested in the case where the epsilon value is not selected a priori by the user, but is adapted during optimization to give the closest possible approximation. In recent work with López-Ibáñez and Laumanns[4], we were able to characterize both theoretically and empirically the approximation and convergence properties of several of the most common archiving algorithms used in multiobjective optimization. Two of the currently best methods are hypervolume-based archiving[3,5], and archiving based on a hierarchical, adaptive grid[6].

    1. C. H. Papadimitriou, M. Yannakakis. The complexity of tradeoffs, and optimal access of web sources. FOCS, 2000.
    2. M Laumanns, L Thiele, K Deb, E. Zitzler. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary computation, 10(3): 263--282, 2002.
    3. J. Knowles. Local-search and hybrid evolutionary algorithms for Pareto optimization. PhD thesis, University of Reading, 2002.
    4. M. López-Ibáñez, J. Knowles, M. Laumanns. On sequential online archiving of objective vectors. Evolutionary Multi-Criterion Optimization, LNCS 6576: 46-60, 2011.
    5. J. Knowles, D. Corne. Properties of an adaptive archiving algorithm for storing nondominated vectors. IEEE Transactions on Evolutionary Computation, 7(2):100-116, 2003.
    6. M. Laumanns, R. Zenklusen. Stochastic convergence of random search methods to fixed size Pareto front approximations. European Journal of Operational Research 213(2): 414-421, 2011.
  • Cluster Ensembles for Temporal Data

    MLO Contact: Ke Chen

    As an emerging area in machine learning, clustering ensemble approaches have been recently studied from different perspectives. The basic idea behind clustering ensemble is combining multiple partitions on the same data set to produce a consensus partition expected to be superior to that of given input partitions. Both empirical and theoretic studies suggest that clustering ensemble provides an alternative technique to overcome weakness underlying individual clustering algorithms. In our work, we propose a weighted clustering ensemble approach guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives to a final partition. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies. On the other hand, temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data but the representation-based temporal data clustering methodology is subject to a fundamental weakness ¨C information loss. The joint use of different representations under our proposed weighted clustering ensemble framework effectively overcomes this fundamental weakness by exploiting various information sources underlying temporal data. Our approach has been applied in benchmark time series, motion trajectory and time-series data stream clustering tasks. In our ongoing studies, we work on formal analysis in justifying the effectiveness of clustering ensemble and developing novel yet theoretically justifiable clustering ensemble approaches.

    • Yang Y. & Chen K., Temporal data clustering via weighted clustering ensemble with different representations.
      IEEE Transactions on Knowledge and Data Engineering 23(2): 307-320, 2011. [PDF]
  • Semi-Supervised Boosting Learning

    MLO Contact: Ke Chen

    Semi-supervised learning concerns the problem of learning in the presence of labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. However, none of them takes all three semi-supervised assumptions, i.e., smoothness, cluster and manifold assumptions, together into account during boosting learning. In this work, we proposed a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions. Thus, minimizing our proposed cost functional with a greedy yet stage-wise functional optimization procedure leads to a generic boosting framework for semi-supervised learning. In extensive experiments we demonstrated that our algorithm yields favorite results for benchmark and real world classification tasks in comparison to state-of-the-art semi-supervised learning algorithms including newly developed boosting algorithms. In our ongoing studies, we work on formal analysis of our proposed semi-supervised boosting framework and exploiting other useful information sources for semi-supervised learning.

    • Chen K. & Wang S., Semi-supervised Learning via Regularized Boosting Working on Multiple Semi-supervised Assumptions.
      IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1): 129-143, 2011. [PDF]
  • Speech Information Component Analysis with Deep Learning

    MLO ContactKe Chen

    It is well known that speech conveys various yet mixed information where there are predominant linguistic information as well as non-verbal speaker-specific and emotional information components. For human communication, all the information components in speech turn out to be very useful and should be exclusively used for different tasks. For example, one often recognizes a speaker regardless of what is spoken for speaker recognition, while it is effortless for him/her to understand what is exactly spoken by different speakers for speech recognition. In general, however, there is no effective way to automatically extract an information component of interest from speech so that the same representation has to be used in different speech information tasks. The interference of different yet entangled speech information components in most existing acoustic representations hinders a speech or speaker recognition system from achieving better performance. Recent studies in machine learning reveal that learning deep architectures provides a new way for tackling complex AI problems. In our work, we have proposed a novel deep neural architecture for learning intrinsic speaker-specific characteristics. As a result, multi-objective loss functions are proposed for learning speaker-specific characteristics and regularization via normalizing interference of non-speaker related information and avoiding information loss. We have demonstrated that a resultant speaker-specific representation is insensitive to text/languages spoken and environmental mismatches and hence outperforms MFCCs and other state-of-the-art techniques in speaker recognition. In our ongoing work, we are developing novel yet biologically inspired deep architectures for speech information component analysis towards extracting different task-specific information components of interest from speech and applying them to various speech information processing tasks.

    • Chen K. & Salman A., Learning speaker-specific characteristics with a deep neural architecture.
      IEEE Transactions on Neural Networks and Learning Systems, vol. 23, 2012. (to appear) [PDF]
    • Chen K. & Salman A., Extracting speaker-specific information with a regularized Siamese deep network.
      Advances in Neural Information Processing Systems 25 (NIPS'11), MIT Press, 2011. [PDF]
  • Boosting as a Product of Experts

    MLO Contact: Nara Edakunni and Gavin Brown

    Our research focuses on developing a probabilistic model for boosting and framing the learning updates as a form of incremental model adaptation by adding new experts to the ensemble. A probabilistic framework for boosting provides a number of advantages including a simple and well motivated model of the data. Furthermore, it makes the modeling assumptions made in boosting explicit and allows us to seamlessly apply boosting across different problem settings by varying the probabilistic model of the constituent experts. A probabilistic model of boosting also enables us to use a plethora of inference techniques like likelihood maximization and Bayesian inference to learn the parameters of the model. In a recent paper, we have shown that boosting corresponds to a Product of Experts model which is a normalized product of probabilities with the component probabilities being contributed by the experts in the ensemble. The ensemble of experts is expanded at each iteration by adding a new expert such that the likelihood of the observed data, as predicted by the ensemble does not decrease with the addition of an expert. We show that such a condition of non-decreasing likelihood at each iteration naturally leads to a constraint on the parameters of the expert similar to the famous weak learning criteria in boosting. For a specific parametrization of the expert probabilities we can also show that incremental learning in PoE reduces to a variant of the AdaBoost algorithm.

    • N. Edakunni, G. Brown, and T.Kovacs, "Boosting as a Product of Experts"
      Uncertainty in Artificial Intelligence, 2011
  • Ensemble Diversity in Non-Stationary Environments

    MLO Contact: Gavin Brown and Richard Stapenhurst

    The 'diversity' of an ensemble quantifies how different individual learners within an ensemble are. It is widely accepted that good ensemble must have some diversity, but that too much diversity will be detrimental to performance. We are examining the relationship between diversity and the distributions of voting margins, which seems to suggest a far more straight-forward interpretation and application of diversity than is common in the literature. Another facet of our research involves non-stationary learning, where we wish to model some process that changes over time, and the application of diversity to this problem. Ensembles have been shown to perform well on non-stationary problems, but generally the techniques for adapting to new concepts are somewhat heuristic, or require tuned parameters. We have shown that the diversity of an ensemble determines its ability to adapt; future work focuses on exploiting this observation to produce state-of-the-art techniques.

  • Ensemble Methods and Diversity

    MLO Contact: Gavin Brown

    Ensemble Systems are groups of predictors treated as 'committee', to obtain better generalisation than any single predictor, and have emerged as one of the most powerful pattern recognition techniques of the last decade. The success of such methods rests on the committee members exhibiting some kind of 'diversity'. Our work has analyzed diversity at a fundamental level, with new observations on how it can be formulated and exploited - the primary contribution was a new understanding of the Negative Correlation learning algorithm, showing it is capable of explicitly managing diversity.

    • G. Brown, J. Wyatt, P.Tino, Managing Diversity in Regression Ensembles [PDF]
      JMLR vol 6 (2006).
    • Diversity Creation Methods: A Survey and Categorisation, Brown, Wyatt, Harris, Yao. [PDF]
      Journal of Information Fusion, vol 6, 2005
  • Information Theoretic Feature Selection

    MLO contact: Gavin Brown and Adam Pocock

    Our current research is on developing a novel theoretical foundation for mutual information based feature selection. We begin by defining a discriminative model, and aim to maximise the joint likelihood of this model. We derive an information theoretic feature selection term from this model, which when minimised, maximises the model likelihood. We show that when using a flat prior over the features, this feature selection term is exactly that optimised by a group of Markov Blanket discovery algorithms, and is approximated by a large group of mutual information based filters. In recent work we use this likelihood perspective to investigate the properties of mutual information based filters, and understand the probabilistic assumptions they impose on the data. We thus provide a unifying perspective, including 2 decades of literature in a common framework. Our probabilistic interpretation of feature selection leads to several natural extensions for cost sensitivity and incorporating domain knoweldge. We are now looking at using informative priors to guide feature selection by modifying these filter criteria to include domain knowledge. This gives a family of selection criteria which can include domain knowledge about the size of the feature set, or the relationships between the features and the class label.

  • ▲ Up to the top