Statistical issues associated with modeling of synonymous mutation data

Snehalata Huzurbazar; Sarabdeep Singh; Jessica A. Schlueter

doi:10.1515/sagmb-2012-0033

Artikel

Statistical issues associated with modeling of synonymous mutation data

Snehalata Huzurbazar , Sarabdeep Singh und Jessica A. Schlueter

Veröffentlicht/Copyright: 24. April 2013

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Statistical Applications in Genetics and Molecular Biology Band 12 Heft 3

Abstract

The explosion of data in evolutionary bioinformatics has led to sometimes ad hoc, incomplete and even inaccurate data analyses. Taking dS data, namely, data on synonymous substitutions per synonymous sites, we go through a statistical analysis for modeling the time since duplications of genes. We explore the shortcomings of previous analyses, especially with a view towards their effect on inference for the gene duplication process. We present a statistical analysis which respects the assumptions of the models and the integrity of the data, and emphasize that exploratory data analysis, formulation of a data model, its estimation and finally, assessment of the model are important steps in a complete data analysis. Furthermore, for dS data, we develop Bayesian discrete-continuous mixture models and present analyses using two genomes.

Keywords: Bayesian models; truncation; discrete-continuous mixture distribution; distributional fit; Weibull

Corresponding author: Snehalata Huzurbazar, Statistical and Applied Mathematical Sciences Institute, 19 T.W. Alexander Drive, P.O. Box 14006, Research Triangle Park, NC 27709-4006, USA; Department of Statistics, University of Wyoming, Dept. 3332, 1000 E. University Ave, Laramie, WY 82071, USA; and Department of Statistics, North Carolina State University, 5109 SAS Hall, 2311 Stinson Drive, Raleigh, NC 27695-8203, USA

The three authors’ research was supported by a grant to the University of Wyoming from the National Science Foundation under grant DMS-1100615. Huzurbazar’s contribution was also based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Huzurbazar and Singh thank David Liberles and Anke Konrad for introducing them to the study of gene duplications and evolutionary bioinformatics in general. All the authors thank the associate editor Vincent Plagnol and two anonymous reviewers for very helpful suggestions.

References

Altschul, S., W. Gish, W. Miller, E. Myers, and D. Lipman (1990): “Basic local alignment search tool,” J. Mol. Biol., 215, 403–410.Suche in Google Scholar

Blanc, G. and K. Wolfe (2004): “Widespread paleopolyploidy in model plans species inferred from age distributions of duplicate genes,” Plant Cell, 16, 1667–1678.10.1105/tpc.021345Suche in Google Scholar PubMed PubMed Central

Bohning, D. and R. Kuhnert (2006): “Equivalence of truncated count mixture distributions and mixtures of truncated count distributions,” Biometrics, 62, 1207–1215.10.1111/j.1541-0420.2006.00565.xSuche in Google Scholar PubMed

Celeux, G., M. Hurn, and C. Robert (2000): “Computational and inferential difficulties with mixture posterior distributions,” J. Am. Stat. Assoc., 95, 957–970.Suche in Google Scholar

Chaudhuri, P. and S. Marron (1999): “Sizer for exploration of structures of curves,” J. Am. Stat. Assoc., 94, 807–823.Suche in Google Scholar

Christensen, R., W. Johnson, A. Branscum and T. E. Hanson (2011): Bayesian ideas and data analysis, Boca Raton, FL: Chapman and Hall/CRC.10.1201/9781439894798Suche in Google Scholar

Cui, L. et al. (2006): “Widespread genome duplication sin flowering plants,” Genome Res., 16, 738–749.Suche in Google Scholar

Davison, A. (2003): Statistical Models, New York: Cambridge University Press.10.1017/CBO9780511815850Suche in Google Scholar

Denoeud, F., S. Henriet, S. Mungpakdee, J.-M. Aury, C. D. Silva, H. Brinkmann, J. Mikhaleva, L. C. Olsen, C. Jubin, C. Caestro, J.-M. Bouquet, G. Danks, J. Poulain, C. Campsteijn, M. Adamski, I. Cross, F. Yadetie, M. Muffato, A. Louis, S. Butcher, G. Tsagkogeorga, A. Konrad, S. Singh, M. F. Jensen, E. H. Cong, H. Eikeseth-Otteraa, B. Noel, V. Anthouard, B. M. Porcel, R. Kachouri-Lafond, A. Nishino, M. Ugolini, P. Chourrout, H. Nishida, R. Aasland, S. Huzurbazar, E. Westhof, F. Delsuc, H. Lehrach, R. Reinhardt, J. Weissenbach, S. W. Roy, F. Artiguenave, J. H. Postlethwait, J. R. Manak, E. M. Thompson, O. Jaillon, L. D. Pasquier, P. Boudinot, D. A. Liberles, J.-N. Volff, H. Philippe, B. Lenhard, H. R. Crollius, P. Wincker and D. Chourrout (2010): “Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate,” Science, 330, 1381–1385.10.1126/science.1194167Suche in Google Scholar PubMed PubMed Central

Edgar R. (2004): “Muscle: a multiple sequence alignment method with reduced time and space complexity,” BMC Bioinformatics, 5, 113–133.Suche in Google Scholar

Gelman, A., J. Carlin, H. Stern and D. Rubin (2004): Bayesian data analysis, Boca Raton, FL: Chapman and Hall/CRC.10.1201/9780429258480Suche in Google Scholar

Holland, P., J. Garcia-Fernandez, N. Williams and A. Sidow (1994): “Gene duplications and the origins of vertebrate development,” Development Supplement, 120(Suppl), 125–133.10.1242/dev.1994.Supplement.125Suche in Google Scholar

Hughes, T. and D. Liberles (2007): “The pattern of evolution of smaller-scale gene duplicates in mammalian genomes in more consistent with neo- than subfunctionalisation,” J. Mol. Evol., 65, 574–588.10.1007/s00239-007-9041-9Suche in Google Scholar PubMed

Jaillon, O., J. Aury, F. Brunet, J. Petit, N. S.-T. N, E. Mauceli, L. Bouneau, C. Fischer, C. Ozouf-Costaz, A. Bernot, S. Nicaud, D. Jaffe, S. Fisher, G. Lutfalla, C. Dossat, B. Segurens, C. Dasilva, M. Salanoubat, M. Levy, N. Boudet, S. Castellano, V. Anthouard, C. Jubin, V. Castelli, M. Katinka, B. Vacherie, C. Bimont, Z. Skalli, L. Cattolico, J. Poulain, V. D. Berardinis, C. Cruaud, S. Duprat, P. Brottier, J. Coutanceau, J. Gouzy, G. Parra, G. Lardier, C. Chapple, K. McKernan, P. McEwan, S. Bosak, M. Kellis, J. Volff, R. Guig, M. Zody, J. Mesirov, K. Lindblad-Toh, B. Birren, C. Nusbaum, D. Kahn, M. Robinson-Rechavi, V. Laudet, V. Schachter, F. Qutier,W. Saurin, C. Scarpelli, P.Wincker, E. Lander, J. Weissenbach, and H. R. Crollius (2004): “Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype,” Nature, 431, 46–57.10.1038/nature03025Suche in Google Scholar PubMed

Jasra, A., C. Holmes and D. Stephens (2005): “Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling,” Stat. Sci., 20, 50–67.Suche in Google Scholar

Lynch, M. and J. Conery (2000): “The evolutionary fate and consequences of duplicate genes,” Science, 290, 1151–1155.10.1126/science.290.5494.1151Suche in Google Scholar PubMed

Maere, S., S. D. Bodt, J. Raes, T. Casneuf, M. Montagu, M. Kuiper, and Y. V. de Peer (2005): “Modeling gene and genome duplications in eukaryotes,” PNAS, 102, 5454–5459.10.1073/pnas.0501102102Suche in Google Scholar PubMed PubMed Central

Marinari, E. and G. Parisi (1992): “Simulated tempering: a new monte carlo scheme,” Europhys. Lett., 19, 451–458.Suche in Google Scholar

Masterson, J. (1994): “Stomatal size in fossil plants: evidence for polyploidy in majority of angiosperms,” Science, 264, 421–423.10.1126/science.264.5157.421Suche in Google Scholar PubMed

Ohno, S. (1970): Evolution by Gene Duplication, New York: Springer-Verlag.10.1007/978-3-642-86659-3Suche in Google Scholar

R Core Team (2012): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0.Suche in Google Scholar

Robert, C. and G. Casella. (2004): Monte Carlo Statistical Methods, 2nd edition, New York: Springer.10.1007/978-1-4757-4145-2Suche in Google Scholar

Schlueter, J., P. Dixon, C. Granger, D. Grant, L. Clar, J. Doyle, and R. Shoemaker (2004): “Mining est databases to resolve evolutionary events in major crop species,“ Genome, 47, 868–876.10.1139/g04-047Suche in Google Scholar PubMed

Taylor, J. and J. Raes (2004): “Duplication and divergence: the evolution of new genes and old ideas,” Annual Review of Genetics, 38, 615–643.10.1146/annurev.genet.38.072902.092831Suche in Google Scholar PubMed

Yang, Z. (1997): “Paml: a program package for phylogenetic analysis by maximum likelihood,” Comput. Appl. BioSci., 13, 555–556.Suche in Google Scholar

Published Online: 2013-04-24

Published in Print: 2013-06-01

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/sagmb-2012-0033

Schlagwörter für diesen Artikel

Bayesian models; truncation; discrete-continuous mixture distribution; distributional fit; Weibull