Home Life Sciences Semi-automatic selection of summary statistics for ABC model choice
Article
Licensed
Unlicensed Requires Authentication

Semi-automatic selection of summary statistics for ABC model choice

  • Dennis Prangle EMAIL logo , Paul Fearnhead , Murray P. Cox , Patrick J. Biggs and Nigel P. French
Published/Copyright: December 10, 2013

Abstract

A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data.


Corresponding author: Dennis Prangle, Department of Mathematics and Statistics, Lancaster University, UK, e-mail:

The authors acknowledge the Marsden Fund project 08-MAU-099 (Cows, starlings and Campylobacter in New Zealand: unifying phylogeny, genealogy, and epidemiology to gain insight into pathogen evolution) for funding this project. This publication made use of the Campylobacter Multi Locus Sequence Typing website (http://pubmlst.org/campylobacter/) developed by Keith Jolley and sited at the University of Oxford (Jolley and Maiden 2010, BMC Bioinformatics, 11:595). The development of this site has been funded by the Wellcome Trust. The paper has benefited from many helpful suggestions of two anonymous reviewers.

References

Atkinson, I. A. and E. K. Cameron (1993): “Human influence on the terrestrial biota and biotic communities of New Zealand,” Trends in Ecology & Evolution, 8, 447–451.10.1016/0169-5347(93)90008-DSearch in Google Scholar

Barnes, C. P., S. Filippi, M. P. H. Stumpf and T. Thorne (2012a): “Considerate approaches to constructing summary statistics for ABC model selection,” Statistics and Computing, 22, 1181–1197.10.1007/s11222-012-9335-7Search in Google Scholar

Barnes, C. P., S. Filippi and M. P. H. Stumpf (2012b): “Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation,” Journal of the Royal Statistical Society: Series B, 74, 453.10.1111/j.1467-9868.2011.01010.xSearch in Google Scholar

Beaumont, M. A. (2008): “Joint determination of topology, divergence time, and immigration in population trees,” In: C. Renfrew, S. Matsumura, and P. Forster, editors, Simulation, Genetics and Human Prehistory. McDonald Institute Monographs, pp. 134–154.Search in Google Scholar

Beaumont, M. A., W. Zhang and D. J. Balding (2002): “Approximate Bayesian computation in population genetics,” Genetics, 162, 2025–2035.10.1093/genetics/162.4.2025Search in Google Scholar PubMed PubMed Central

Blum, M. G. B. (2010): “Approximate Bayesian computation: a nonparametric perspective,” Journal of the American Statistical Association, 105 (491), 1178–1187.10.1198/jasa.2010.tm09448Search in Google Scholar

Blum, M. G. B. and O. François (2010): “Non-linear regression models for approximate Bayesian computation,” Statistics and Computing, 20, 63–73.10.1007/s11222-009-9116-0Search in Google Scholar

Blum, M. G. B., M. A. Nunes, D. Prangle and S. A. Sisson (2013): “A comparative review of dimension reduction methods in approximate Bayesian computation,” Statistical Science, 28, 189–208.10.1214/12-STS406Search in Google Scholar

Del Moral, P., A. Doucet and A. Jasra (2012): “An adaptive sequential Monte Carlo method for approximate Bayesian computation,” Statistics and Computing, 22 (5), 1009–1020.10.1007/s11222-011-9271-ySearch in Google Scholar

Didelot, X., R. G. Everitt, A. M. Johansen and D. J. Lawson (2011): “Likelihood-free estimation of model evidence,” Bayesian Analysis 6 (1), 49–76.10.1214/11-BA602Search in Google Scholar

Dingle, K. E., F. M. Colles, D. R. A. Wareing, M. C. J. Maiden, M. C. J. Ure, R. Maiden, A. J. Fox, F. E. Bolton, H. J. Bootsma, R. J. Willems, R. Urwin and M. C. Maiden (2001): “Multilocus sequence typing system for Campylobacter jejuni,” Journal of Clinical Microbiology, 39, 14–23.10.1128/JCM.39.1.14-23.2001Search in Google Scholar PubMed PubMed Central

Drovandi, C. C. and A. N. Pettitt (2011): “Estimation of parameters for macroparasite population evolution using approximate Bayesian computation,” Biometrics, 67 (1), 225–233.10.1111/j.1541-0420.2010.01410.xSearch in Google Scholar PubMed

Estoup, A., E. Lombaert, J.-M. Marin, T. Guillemaud, P. Pudlo, C. P. Robert and J. Cornuet (2012): “Estimation of demo-genetic model probabilities with approximate Bayesian computation using linear discriminant analysis on summary statistics,” Molecular Ecology Resources, 12 (5), 846–855.10.1111/j.1755-0998.2012.03153.xSearch in Google Scholar PubMed

Fan, Y., D. J. Nott and S. A. Sisson (2013): Approximate Bayesian computation via regression density estimation. Stat, 2, 34–48.10.1002/sta4.15Search in Google Scholar

Fearnhead, P. and D. Prangle (2012): “Constructing summary statistics for approximate Bayesian computation: semi-automatic ABC,” Journal of the Royal Statistical Society, Series B, 74, 419–474.10.1111/j.1467-9868.2011.01010.xSearch in Google Scholar

French, N., S. Yu, P. Biggs, B. Holland, P. Fearnhead, B. Binney, A. Fox, D. H. Grove-White, J. Leigh, W. Miller, P. Muellner and P. Carter (2014): “Evolution of Campylobacter species in New Zealand,” In S. Sheppard and G. Méric, editors, Campylobacter Ecology and Evolution. Caister Academic Press, Norfolk.Search in Google Scholar

Friedman, J., T. Hastie, and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, 33 (1).10.18637/jss.v033.i01Search in Google Scholar

Grelaud, A., C. Robert, J.-M. Marin, F. Rodolphe and J. F. Taly (2009): “ABC likelihood-free methods for model choice in Gibbs random fields,” Bayesian Analysis, 4 (2), 317–336.10.1214/09-BA412Search in Google Scholar

Hudson, R. R. (2002): “Generating samples under a Wright-Fisher neutral model of genetic variation,” Bioinformatics, 18, 337–338.10.1093/bioinformatics/18.2.337Search in Google Scholar PubMed

Humphrey, T., S. O’Brien and M. Madsen (2007): “Campylobacters as zoonotic pathogens: a food production perspective,” International Journal of Food Microbiology, 117 (3), 237–57.10.1016/j.ijfoodmicro.2007.01.006Search in Google Scholar PubMed

Joyce, P. and P. Marjoram (2008): “Approximately sufficient statistics and Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 7, 2008. Article 26.Search in Google Scholar

Kolmogorov, A. N. (1942): “Determination of centre of dispersion and measure of accuracy from a finite number of observations (in Russian),” Izv. Akad. Nauk, USSR Ser. Mat., 6, 3–32.Search in Google Scholar

Liu, J. S. (1996): “Metropolized independent sampling with comparisons to rejection sampling and importance sampling,” Statistics and Computing, 6, 113–119.10.1007/BF00162521Search in Google Scholar

Marin, J.-M., N. Pillai, C. P. Robert and J. Rousseau (2013): “Relevant statistics for Bayesian model choice,” Preprint. Available at http://www.arxiv.org/abs/1110.4700.Search in Google Scholar

Mullner, P., S. E. F. Spencer, D. J. Wilson, G. Jones, A. D. Noble, A. C. Midwinter, J. M. Collins-Emerson, P. Carter, S. Hathaway and N. P. French (2009): “Assigning the source of human campylobacteriosis in New Zealand: a comparative genetic and epidemiological approach,” Infection, Genetics and Evolution 9 (6), 1311–1319.10.1016/j.meegid.2009.09.003Search in Google Scholar PubMed

Nordborg, M. (2004): “Coalescent theory,” In: D.J. Balding, M. Bishop, C. Cannings (Eds.). Handbook of statistical genetics, Wiley-Interscience, volume 2, New York.10.1002/0470022620.bbc21Search in Google Scholar

Nunes, M. A. and D. J. Balding (2010): “On optimal selection of summary statistics for approximate Bayesian computation,” Statistical Applications in Genetics and Molecular Biology, 9 (1), 2010.10.2202/1544-6115.1576Search in Google Scholar PubMed

Rambaut, A. and N. C. Grassly (1997): “Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees,” Computer Applications in the Biosciences, 13, 235–238.10.1093/bioinformatics/13.3.235Search in Google Scholar PubMed

Rayner, G. D. and H. L. MacGillivray (2002): “Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions,” Statistics and Computing, 12 (1), 57–75.10.1023/A:1013120305780Search in Google Scholar

Robert, C. P. (1996): “Intrinsic losses,” Theory and decision, 40 (2), 191–214.10.1007/BF00133173Search in Google Scholar

Robert, C. P. (2014): Bayesian computational tools. Annual Review of Statistics and Its Application, 1, 16.1–16.25.10.1146/annurev-statistics-022513-115543Search in Google Scholar

Robert, C. P., J. M. Cornuet, J.-M. Marin and N. Pillai (2011): “Lack of confidence in approximate Bayesian computation model choice,” Proceedings of the National Academy of Sciences, 108 (37), 15112–15117.10.1073/pnas.1102900108Search in Google Scholar PubMed PubMed Central

Savill, M., A. Hudson, M. Devane, N. Garrett, B. Gilpin and A. Ball (2003): “Elucidation of potential transmission routes of Campylobacter in New Zealand,” Water Science and Technology, 47 (3), 31–38.10.2166/wst.2003.0154Search in Google Scholar

Sears, A., M. G. Baker, N. Wilson, J. Marshall, P. Muellner, D. M. Campbell, R. J. Lake and N. P. French (2011): “Marked campylobacteriosis decline after interventions aimed at poultry, New Zealand,” Emerging Infectious Diseases, 17 (6), 1007–1015.10.3201/eid/1706.101272Search in Google Scholar PubMed PubMed Central

Sjödin, P., A. E. Sjöstrand, M. Jakobsson and M. G. B. Blum (2012): “Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period,” Molecular Biology and Evolution, 29 (7), 1851–1860.10.1093/molbev/mss061Search in Google Scholar PubMed

Sousa, V. C., M. A. Beaumont, P. Fernandes, M. M. Coelho and L. Chikhi (2012): “Population divergence with or without admixture: selecting models using an ABC approach,” Heredity, 108, 521–530.10.1038/hdy.2011.116Search in Google Scholar PubMed PubMed Central

Toni, T. and M. P. H. Stumpf (2010): “Simulation-based model selection for dynamical systems in systems and population biology,” Bioinformatics, 26 (1), 104–110.10.1093/bioinformatics/btp619Search in Google Scholar PubMed PubMed Central

Wilson, D. J., E. Gabriel, A. J. H. Leatherbarrow, J. Cheesbrough, S. Gee, E. Bolton, A. Fox, C. A. Hart, P. J. Diggle and P. Fearnhead (2009): “Rapid evolution and the importance of recombination to the gastroenteric pathogen Campylobacter jejuni,” Molecular Biology and Evolution, 26 (2), 385–397.10.1093/molbev/msn264Search in Google Scholar PubMed PubMed Central

Wiuf C. and J. Hein (2000): “The coalescent with gene conversion,” Genetics, 155, 451–462.10.1093/genetics/155.1.451Search in Google Scholar PubMed PubMed Central

Yu, S., P. Fearnhead, B. R. Holland, P. Biggs, M. Maiden and N. P. French (2012): “Estimating the relative roles of recombination and point mutation in the generation of single locus variants in Campylobacter jejuni and Campylobacter coli,” Journal of Molecular Evolution, 74 (5–6), 273–280.10.1007/s00239-012-9505-4Search in Google Scholar PubMed PubMed Central

Published Online: 2013-12-10
Published in Print: 2014-02-01

©2014 by Walter de Gruyter Berlin Boston

Downloaded on 6.2.2026 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0012/html
Scroll to top button