Abstract
Markov chains are widely used for modeling in many areas of molecular biology and genetics. As the complexity of such models advances, it becomes increasingly important to assess the rate at which a Markov chain converges to its stationary distribution in order to carry out accurate inference. A common measure of convergence to the stationary distribution is the total variation distance, but this measure can be difficult to compute when the state space of the chain is large. We propose a Monte Carlo method to estimate the total variation distance that can be applied in this situation, and we demonstrate how the method can be efficiently implemented by taking advantage of GPU computing techniques. We apply the method to two Markov chains on the space of phylogenetic trees, and discuss the implications of our findings for the development of algorithms for phylogenetic inference.
We acknowledge computing support from the Ohio Supercomputer Center (http://www.osc.edu/).
Conflict of interest statement
Funding: The first author is supported in part by the National Science Foundation award DMS-1209142. The second author is supported in part by the National Science Foundation award DMS-1106706.
References
Aldous, D. (2000) “Mixing time for a Markov chain on cladograms,” Comb. Probab. Comput., 9, 191–204.10.1017/S096354830000417XSuche in Google Scholar
Aldous, D. (2012) URL http://www.stat.berkeley.edu/~aldous/Research/OP/clad-mix.pdfSuche in Google Scholar
Conger, M. and D. Viswanath (2006): “Shuffling cards for blackjack, bridge and other card games,” http://arxiv.org/abs/math/0606031.Suche in Google Scholar
Cron, A. and M. West (2011) “Efficient classification-based relabeling in mixture models,” The American Statistician, 65, 16–20.10.1198/tast.2011.10170Suche in Google Scholar PubMed PubMed Central
Diaconis, P. W. and S. P. Holmes (1998) “Matchings and phylogenetic trees,” PNAS, 95, 14600–14602.10.1073/pnas.95.25.14600Suche in Google Scholar PubMed PubMed Central
Guindon, S. and O. Gascuel (2003) “A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood,” Syst. Biol., 52, 696–704.10.1080/10635150390235520Suche in Google Scholar PubMed
L’Ecuyer, P., R. Simard, E. J. Chien and D. W. Kelton (2002) “An object-oriented random-number package with many long streams and substreams,” Oper. Res., 50, 1073–1075.10.1287/opre.50.6.1073.358Suche in Google Scholar
Lee, L., C. Yau, M. B. Giles, A. Doucet and C. C. Homes (2010) “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods,” J. Comput. Graph. Stat., 19, 769–789.10.1198/jcgs.2010.10039Suche in Google Scholar PubMed PubMed Central
Levin, D. A., Y. Peres and E. L. Wilmer (2009) “Markov chains and mixing times,” American Mathematical Society.10.1090/mbk/058Suche in Google Scholar
Li, S., D. K. Pearl and H. Doss (2000) “Phylogenetic tree construction using Markov chain Monte Carlo,” J. Am. Stat. Assoc., 95, 493–508.10.1080/01621459.2000.10474227Suche in Google Scholar
Matsumoto, M. and T. Nishimura (1998) “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Transactions on Modeling and Computer Simulation, 8, 3–30.10.1145/272991.272995Suche in Google Scholar
Mossel, E. and E. Vigoda (2005) “Phylogenetic MCMC algorithms are misleading on mixtures of trees,” Science, 309, 2207–2209.10.1126/science.1115493Suche in Google Scholar PubMed
NVIDIA (2012a) “CUDA C Programming Guide Version 4.2,” http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA C Programming Guide.pdf.Suche in Google Scholar
NVIDIA (2012b) “CUDA Toolkit 4.2 CURAND Guide,” http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CURAND Library.pdf.Suche in Google Scholar
Randall, D. and P. Tetali (1999) “Analyzing glauber dynamics by comparison of Markov chains,” Journal of Mathematical Physics, 41, 1598–1615.10.1063/1.533199Suche in Google Scholar
Ronquist, F., M. Teslenko, P. van der Mark, D. Ayres, A. Darling, S. Hohna, B. Larget, L. Liu, M. A. Suchard and J. P. Huelsenbeck (2012) “Mrbayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space,” Syst. Biol., 6, 539–542.10.1093/sysbio/sys029Suche in Google Scholar PubMed PubMed Central
Salter, L. and D. K. Pearl (2001) “Stochastic search strategy for estimation of maximum likelihood phylogenetic trees,” Syst. Biol., 50, 7–17.10.1080/106351501750107413Suche in Google Scholar
Schweinsberg, J. (2002) “An O(n2) bound for the relaxation time of a Markov chain on cladograms,” Random Struct. Algor., 20, 59–70.10.1002/rsa.1029Suche in Google Scholar
Semple, C. and M. Steel (2003) Phylogenetics, Oxford University Press.Suche in Google Scholar
Spade, D., R. Herbei and L. Kubatko (2012) “A note on the relaxation time of two Markov chains on rooted phylogenetic tree spaces,” submitted (available upon request).Suche in Google Scholar
Stamatakis, A. (2006) “Maximum likelihood-based phylogenetic analysis with thousands of taxa and mixed models,” Bioinformatics, 4, 2688–2690.10.1093/bioinformatics/btl446Suche in Google Scholar PubMed
Suchard, M. A. and A. Rambaut (2009) “Many-core algorithms for statistical phylogenetics,” Bioinformatics, 25, 1370–1376.10.1093/bioinformatics/btp244Suche in Google Scholar PubMed PubMed Central
Suchard, M., Q. Wang, C. Chan, J. Frelinger, A. Cron and M. West (2010) “Understanding GPU programming for statistical computation: studies in massively parallel massive mixtures,” J. Comput. Graph. Stat., 19, 419–438.10.1198/jcgs.2010.10016Suche in Google Scholar PubMed PubMed Central
Swofford, D. (2002) “PAUP*: phylogenetic analysis using parsimony (*and other methods), version 4.b10,” Sinauer Associates, Inc.Suche in Google Scholar
Yang, Z. and B. Rannala (1997) “Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method,” Mol. Biol. Evol., 14, 717–724.10.1093/oxfordjournals.molbev.a025811Suche in Google Scholar PubMed
Zwickl, D. (2006) “Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion,” Ph.D. Thesis, The University of Texas at Austin.Suche in Google Scholar
©2013 by Walter de Gruyter Berlin Boston
Artikel in diesem Heft
- Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
- Approximate Bayesian computation with functional statistics
- Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics
- Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
- Flexible pooling in gene expression profiles: design and statistical modeling of experiments for unbiased contrasts
- On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo
- Inferring latent gene regulatory network kinetics
Artikel in diesem Heft
- Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
- Approximate Bayesian computation with functional statistics
- Monte Carlo estimation of total variation distance of Markov chains on large spaces, with application to phylogenetics
- Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
- Flexible pooling in gene expression profiles: design and statistical modeling of experiments for unbiased contrasts
- On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo
- Inferring latent gene regulatory network kinetics