Abstract
Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.
We acknowledge computing support from the Ohio Supercomputer Center (http://www.osc.edu/).
Conflict of interest statement
Funding: The first author is supported in part by the National Science Foundation award DMS-1209142. The second author is supported in part by the National Science Foundation award DMS-1106706.
References
Aldous, D. (2000) “Mixing time for a Markov chain on cladograms,” Comb. Probab. Comput., 9, 191–204.10.1017/S096354830000417XSearch in Google Scholar
Aldous, D. (2012) URL http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdfSearch in Google Scholar
Conger, M. and D. Viswanath (2006): “Shuffling cards for blackjack, bridge and other card games,” http://arxiv.org/abs/math/0606031.Search in Google Scholar
Cron, A. and M. West (2011) “Efficient classification-based relabeling in mixture models,” The American Statistician, 65, 16–20.10.1198/tast.2011.10170Search in Google Scholar PubMed PubMed Central
Diaconis, P. W. and S. P. Holmes (1998) “Matchings and phylogenetic trees,” PNAS, 95, 14600–14602.10.1073/pnas.95.25.14600Search in Google Scholar PubMed PubMed Central
Guindon, S. and O. Gascuel (2003) “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Syst. Biol., 52, 696–704.10.1080/10635150390235520Search in Google Scholar PubMed
L’Ecuyer, P., R. Simard, E. J. Chien and D. W. Kelton (2002) “An object-oriented random-number package with many long streams and substreams,” Oper. Res., 50, 1073–1075.10.1287/opre.50.6.1073.358Search in Google Scholar
Lee, L., C. Yau, M. B. Giles, A. Doucet and C. C. Homes (2010) “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods,” J. Comput. Graph. Stat., 19, 769–789.10.1198/jcgs.2010.10039Search in Google Scholar PubMed PubMed Central
Levin, D. A., Y. Peres and E. L. Wilmer (2009) “Markov chains and mixing times,” American Mathematical Society.10.1090/mbk/058Search in Google Scholar
Li, S., D. K. Pearl and H. Doss (2000) “Phylogenetic tree construction using Markov chain Monte Carlo,” J. Am. Stat. Assoc., 95, 493–508.10.1080/01621459.2000.10474227Search in Google Scholar
Matsumoto, M. and T. Nishimura (1998) “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation, 8, 3–30.10.1145/272991.272995Search in Google Scholar
Mossel, E. and E. Vigoda (2005) “Phylogenetic MCMC algorithms are misleading on mixtures of trees,” Science, 309, 2207–2209.10.1126/science.1115493Search in Google Scholar PubMed
NVIDIA (2012a) “CUDA C Programming Guide Version 4.2,” http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf.Search in Google Scholar
NVIDIA (2012b) “CUDA Toolkit 4.2 CURAND Guide,” http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND Library.pdf.Search in Google Scholar
Randall, D. and P. Tetali (1999) “Analyzing glauber dynamics by comparison of Markov chains,” Journal of Mathematical Physics, 41, 1598–1615.10.1063/1.533199Search in Google Scholar
Ronquist, F., M. Teslenko, P. van der Mark, D. Ayres, A. Darling, S. Hohna, B. Larget, L. Liu, M. A. Suchard and J. P. Huelsenbeck (2012) “Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space,” Syst. Biol., 6, 539–542.10.1093/sysbio/sys029Search in Google Scholar PubMed PubMed Central
Salter, L. and D. K. Pearl (2001) “Stochastic search strategy for estimation of maximum likelihood phylogenetic trees,” Syst. Biol., 50, 7–17.10.1080/106351501750107413Search in Google Scholar
Schweinsberg, J. (2002) “An O(n2) bound for the relaxation time of a Markov chain on cladograms,” Random Struct. Algor., 20, 59–70.10.1002/rsa.1029Search in Google Scholar
Semple, C. and M. Steel (2003) Phylogenetics, Oxford University Press.Search in Google Scholar
Spade, D., R. Herbei and L. Kubatko (2012) “A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces,” submitted (available upon request).Search in Google Scholar
Stamatakis, A. (2006) “Maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models,” Bioinformatics, 4, 2688–2690.10.1093/bioinformatics/btl446Search in Google Scholar PubMed
Suchard, M. A. and A. Rambaut (2009) “Many-core algorithms for statistical phylogenetics,” Bioinformatics, 25, 1370–1376.10.1093/bioinformatics/btp244Search in Google Scholar PubMed PubMed Central
Suchard, M., Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West (2010) “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.10.1198/jcgs.2010.10016Search in Google Scholar PubMed PubMed Central
Swofford, D. (2002) “PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.b10,” Sinauer Associates, Inc.Search in Google Scholar
Yang, Z. and B. Rannala (1997) “Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method,” Mol. Biol. Evol., 14, 717–724.10.1093/oxfordjournals.molbev.a025811Search in Google Scholar PubMed
Zwickl, D. (2006) “Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion,” Ph.D. Thesis, The University of Texas at Austin.Search in Google Scholar
©2013 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
- Approximate Bayesian computation with functional statistics
- Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics
- Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
- Flexible pooling in gene expression profiles: design and statistical modeling of experiments for unbiased contrasts
- On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo
- Inferring latent gene regulatory network kinetics
Articles in the same Issue
- Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
- Approximate Bayesian computation with functional statistics
- Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics
- Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
- Flexible pooling in gene expression profiles: design and statistical modeling of experiments for unbiased contrasts
- On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo
- Inferring latent gene regulatory network kinetics