The Algebra of Contingency Tables & Statistical Applications
- Speaker: Dr. Lawrence Cox (National Institute for Statistical Research)
- Host: John Keane
- 5th November 2013 at 16:00 in University Place 3.204
Many problems on contingency tables, such as obtaining exact bounds for suppressed entries, could be solved using integer optimization. Other problems, inherently statistical, could be solved through a combination of integer optimization and probabilistic methods. The central difficulty is that integer optimization in general is an NP-hard problem, so that solutions are often not computable in practical terms. A notable exception is when the underlying mathematical structure of the table can be represented by a network. Networks enable solutions in quadratic to cubic time, and many tables familiar to statistical applications are of network type. We describe three statistical problems?sampling contingency tables subject to known minimal sufficient statistics, obtaining exact bounds for suppressed table entries, and imputation in tables subject to prescribed bounds on table entries. For the sampling problem, the approach involves a combination of network modeling to equate moves between solutions (tables) with extreme points of a convenient polytope, optimization over the polytope with respect to an appropriate random cost function, and Markov Chain Monte Carlo to ensure that the appropriate distribution over the set of admissible tables is preserved. The MCMC step was introduced in a seminal paper of Diaconis and Sturmfels (1998). Tables here can be 2-, 3- or many-way tables, hierarchies of tables, or linked tables.