Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics

Radu Herbei; Laura Kubatko

doi:10.1515/sagmb-2012-0023

Artikel

Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics

Radu Herbei und Laura Kubatko

Veröffentlicht/Copyright: 26. März 2013

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 12 Heft 1

Abstract

Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.

Keywords: GPU computing; mixing time; phylogenetics; total variation distance

Corresponding author: Laura Kubatko, The Ohio State University – Statistics, 404 Cockins Hall, 1958, Neil Avenue, Columbus, OH 43210, USA, Phone: +1-614-247-8846, Fax: +1-614-292-2096

We acknowledge computing support from the Ohio Supercomputer Center (http://www.osc.edu/).

Conflict of interest statement

Funding: The first author is supported in part by the National Science Foundation award DMS-1209142. The second author is supported in part by the National Science Foundation award DMS-1106706.

References

Aldous, D. (2000) “Mixing time for a Markov chain on cladograms,” Comb. Probab. Comput., 9, 191–204.10.1017/S096354830000417XSuche in Google Scholar

Aldous, D. (2012) URL http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdfSuche in Google Scholar

Conger, M. and D. Viswanath (2006): “Shuffling cards for blackjack, bridge and other card games,” http://arxiv.org/abs/math/0606031.Suche in Google Scholar

Cron, A. and M. West (2011) “Efficient classification-based relabeling in mixture models,” The American Statistician, 65, 16–20.10.1198/tast.2011.10170Suche in Google Scholar PubMed PubMed Central

Diaconis, P. W. and S. P. Holmes (1998) “Matchings and phylogenetic trees,” PNAS, 95, 14600–14602.10.1073/pnas.95.25.14600Suche in Google Scholar PubMed PubMed Central

Guindon, S. and O. Gascuel (2003) “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Syst. Biol., 52, 696–704.10.1080/10635150390235520Suche in Google Scholar PubMed

L’Ecuyer, P., R. Simard, E. J. Chien and D. W. Kelton (2002) “An object-oriented random-number package with many long streams and substreams,” Oper. Res., 50, 1073–1075.10.1287/opre.50.6.1073.358Suche in Google Scholar

Lee, L., C. Yau, M. B. Giles, A. Doucet and C. C. Homes (2010) “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods,” J. Comput. Graph. Stat., 19, 769–789.10.1198/jcgs.2010.10039Suche in Google Scholar PubMed PubMed Central

Levin, D. A., Y. Peres and E. L. Wilmer (2009) “Markov chains and mixing times,” American Mathematical Society.10.1090/mbk/058Suche in Google Scholar

Li, S., D. K. Pearl and H. Doss (2000) “Phylogenetic tree construction using Markov chain Monte Carlo,” J. Am. Stat. Assoc., 95, 493–508.10.1080/01621459.2000.10474227Suche in Google Scholar

Matsumoto, M. and T. Nishimura (1998) “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation, 8, 3–30.10.1145/272991.272995Suche in Google Scholar

Mossel, E. and E. Vigoda (2005) “Phylogenetic MCMC algorithms are misleading on mixtures of trees,” Science, 309, 2207–2209.10.1126/science.1115493Suche in Google Scholar PubMed

NVIDIA (2012a) “CUDA C Programming Guide Version 4.2,” http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf.Suche in Google Scholar

NVIDIA (2012b) “CUDA Toolkit 4.2 CURAND Guide,” http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND Library.pdf.Suche in Google Scholar

Randall, D. and P. Tetali (1999) “Analyzing glauber dynamics by comparison of Markov chains,” Journal of Mathematical Physics, 41, 1598–1615.10.1063/1.533199Suche in Google Scholar

Ronquist, F., M. Teslenko, P. van der Mark, D. Ayres, A. Darling, S. Hohna, B. Larget, L. Liu, M. A. Suchard and J. P. Huelsenbeck (2012) “Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space,” Syst. Biol., 6, 539–542.10.1093/sysbio/sys029Suche in Google Scholar PubMed PubMed Central

Salter, L. and D. K. Pearl (2001) “Stochastic search strategy for estimation of maximum likelihood phylogenetic trees,” Syst. Biol., 50, 7–17.10.1080/106351501750107413Suche in Google Scholar

Schweinsberg, J. (2002) “An O(n²) bound for the relaxation time of a Markov chain on cladograms,” Random Struct. Algor., 20, 59–70.10.1002/rsa.1029Suche in Google Scholar

Semple, C. and M. Steel (2003) Phylogenetics, Oxford University Press.Suche in Google Scholar

Spade, D., R. Herbei and L. Kubatko (2012) “A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces,” submitted (available upon request).Suche in Google Scholar

Stamatakis, A. (2006) “Maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models,” Bioinformatics, 4, 2688–2690.10.1093/bioinformatics/btl446Suche in Google Scholar PubMed

Suchard, M. A. and A. Rambaut (2009) “Many-core algorithms for statistical phylogenetics,” Bioinformatics, 25, 1370–1376.10.1093/bioinformatics/btp244Suche in Google Scholar PubMed PubMed Central

Suchard, M., Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West (2010) “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.10.1198/jcgs.2010.10016Suche in Google Scholar PubMed PubMed Central

Swofford, D. (2002) “PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.b10,” Sinauer Associates, Inc.Suche in Google Scholar

Yang, Z. and B. Rannala (1997) “Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method,” Mol. Biol. Evol., 14, 717–724.10.1093/oxfordjournals.molbev.a025811Suche in Google Scholar PubMed

Zwickl, D. (2006) “Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion,” Ph.D. Thesis, The University of Texas at Austin.Suche in Google Scholar

Published Online: 2013-03-26

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2012-0023

Schlagwörter für diesen Artikel

GPU computing; mixing time; phylogenetics; total variation distance