Abstract
The explosion of data in evolutionary bioinformatics has led to sometimes ad hoc, incomplete and even inaccurate data analyses. Taking dS data, namely, data on synonymous substitutions per synonymous sites, we go through a statistical analysis for modeling the time since duplications of genes. We explore the shortcomings of previous analyses, especially with a view towards their effect on inference for the gene duplication process. We present a statistical analysis which respects the assumptions of the models and the integrity of the data, and emphasize that exploratory data analysis, formulation of a data model, its estimation and finally, assessment of the model are important steps in a complete data analysis. Furthermore, for dS data, we develop Bayesian discrete-continuous mixture models and present analyses using two genomes.
The three authors’ research was supported by a grant to the University of Wyoming from the National Science Foundation under grant DMS-1100615. Huzurbazar’s contribution was also based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Huzurbazar and Singh thank David Liberles and Anke Konrad for introducing them to the study of gene duplications and evolutionary bioinformatics in general. All the authors thank the associate editor Vincent Plagnol and two anonymous reviewers for very helpful suggestions.
References
Altschul, S., W. Gish, W. Miller, E. Myers, and D. Lipman (1990): “Basic local alignment search tool,” J. Mol. Biol., 215, 403–410.Suche in Google Scholar
Blanc, G. and K. Wolfe (2004): “Widespread paleopolyploidy in model plans species inferred from age distributions of duplicate genes,” Plant Cell, 16, 1667–1678.10.1105/tpc.021345Suche in Google Scholar PubMed PubMed Central
Bohning, D. and R. Kuhnert (2006): “Equivalence of truncated count mixture distributions and mixtures of truncated count distributions,” Biometrics, 62, 1207–1215.10.1111/j.1541-0420.2006.00565.xSuche in Google Scholar PubMed
Celeux, G., M. Hurn, and C. Robert (2000): “Computational and inferential difficulties with mixture posterior distributions,” J. Am. Stat. Assoc., 95, 957–970.Suche in Google Scholar
Chaudhuri, P. and S. Marron (1999): “Sizer for exploration of structures of curves,” J. Am. Stat. Assoc., 94, 807–823.Suche in Google Scholar
Christensen, R., W. Johnson, A. Branscum and T. E. Hanson (2011): Bayesian ideas and data analysis, Boca Raton, FL: Chapman and Hall/CRC.10.1201/9781439894798Suche in Google Scholar
Cui, L. et al. (2006): “Widespread genome duplication sin flowering plants,” Genome Res., 16, 738–749.Suche in Google Scholar
Davison, A. (2003): Statistical Models, New York: Cambridge University Press.10.1017/CBO9780511815850Suche in Google Scholar
Denoeud, F., S. Henriet, S. Mungpakdee, J.-M. Aury, C. D. Silva, H. Brinkmann, J. Mikhaleva, L. C. Olsen, C. Jubin, C. Caestro, J.-M. Bouquet, G. Danks, J. Poulain, C. Campsteijn, M. Adamski, I. Cross, F. Yadetie, M. Muffato, A. Louis, S. Butcher, G. Tsagkogeorga, A. Konrad, S. Singh, M. F. Jensen, E. H. Cong, H. Eikeseth-Otteraa, B. Noel, V. Anthouard, B. M. Porcel, R. Kachouri-Lafond, A. Nishino, M. Ugolini, P. Chourrout, H. Nishida, R. Aasland, S. Huzurbazar, E. Westhof, F. Delsuc, H. Lehrach, R. Reinhardt, J. Weissenbach, S. W. Roy, F. Artiguenave, J. H. Postlethwait, J. R. Manak, E. M. Thompson, O. Jaillon, L. D. Pasquier, P. Boudinot, D. A. Liberles, J.-N. Volff, H. Philippe, B. Lenhard, H. R. Crollius, P. Wincker and D. Chourrout (2010): “Plasticity of animal genome architecture unmasked by rapid evolution of a pelagic tunicate,” Science, 330, 1381–1385.10.1126/science.1194167Suche in Google Scholar PubMed PubMed Central
Edgar R. (2004): “Muscle: a multiple sequence alignment method with reduced time and space complexity,” BMC Bioinformatics, 5, 113–133.Suche in Google Scholar
Gelman, A., J. Carlin, H. Stern and D. Rubin (2004): Bayesian data analysis, Boca Raton, FL: Chapman and Hall/CRC.10.1201/9780429258480Suche in Google Scholar
Holland, P., J. Garcia-Fernandez, N. Williams and A. Sidow (1994): “Gene duplications and the origins of vertebrate development,” Development Supplement, 120(Suppl), 125–133.10.1242/dev.1994.Supplement.125Suche in Google Scholar
Hughes, T. and D. Liberles (2007): “The pattern of evolution of smaller-scale gene duplicates in mammalian genomes in more consistent with neo- than subfunctionalisation,” J. Mol. Evol., 65, 574–588.10.1007/s00239-007-9041-9Suche in Google Scholar PubMed
Jaillon, O., J. Aury, F. Brunet, J. Petit, N. S.-T. N, E. Mauceli, L. Bouneau, C. Fischer, C. Ozouf-Costaz, A. Bernot, S. Nicaud, D. Jaffe, S. Fisher, G. Lutfalla, C. Dossat, B. Segurens, C. Dasilva, M. Salanoubat, M. Levy, N. Boudet, S. Castellano, V. Anthouard, C. Jubin, V. Castelli, M. Katinka, B. Vacherie, C. Bimont, Z. Skalli, L. Cattolico, J. Poulain, V. D. Berardinis, C. Cruaud, S. Duprat, P. Brottier, J. Coutanceau, J. Gouzy, G. Parra, G. Lardier, C. Chapple, K. McKernan, P. McEwan, S. Bosak, M. Kellis, J. Volff, R. Guig, M. Zody, J. Mesirov, K. Lindblad-Toh, B. Birren, C. Nusbaum, D. Kahn, M. Robinson-Rechavi, V. Laudet, V. Schachter, F. Qutier,W. Saurin, C. Scarpelli, P.Wincker, E. Lander, J. Weissenbach, and H. R. Crollius (2004): “Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype,” Nature, 431, 46–57.10.1038/nature03025Suche in Google Scholar PubMed
Jasra, A., C. Holmes and D. Stephens (2005): “Markov chain monte carlo methods and the label switching problem in bayesian mixture modeling,” Stat. Sci., 20, 50–67.Suche in Google Scholar
Lynch, M. and J. Conery (2000): “The evolutionary fate and consequences of duplicate genes,” Science, 290, 1151–1155.10.1126/science.290.5494.1151Suche in Google Scholar PubMed
Maere, S., S. D. Bodt, J. Raes, T. Casneuf, M. Montagu, M. Kuiper, and Y. V. de Peer (2005): “Modeling gene and genome duplications in eukaryotes,” PNAS, 102, 5454–5459.10.1073/pnas.0501102102Suche in Google Scholar PubMed PubMed Central
Marinari, E. and G. Parisi (1992): “Simulated tempering: a new monte carlo scheme,” Europhys. Lett., 19, 451–458.Suche in Google Scholar
Masterson, J. (1994): “Stomatal size in fossil plants: evidence for polyploidy in majority of angiosperms,” Science, 264, 421–423.10.1126/science.264.5157.421Suche in Google Scholar PubMed
Ohno, S. (1970): Evolution by Gene Duplication, New York: Springer-Verlag.10.1007/978-3-642-86659-3Suche in Google Scholar
R Core Team (2012): R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0.Suche in Google Scholar
Robert, C. and G. Casella. (2004): Monte Carlo Statistical Methods, 2nd edition, New York: Springer.10.1007/978-1-4757-4145-2Suche in Google Scholar
Schlueter, J., P. Dixon, C. Granger, D. Grant, L. Clar, J. Doyle, and R. Shoemaker (2004): “Mining est databases to resolve evolutionary events in major crop species,“ Genome, 47, 868–876.10.1139/g04-047Suche in Google Scholar PubMed
Taylor, J. and J. Raes (2004): “Duplication and divergence: the evolution of new genes and old ideas,” Annual Review of Genetics, 38, 615–643.10.1146/annurev.genet.38.072902.092831Suche in Google Scholar PubMed
Yang, Z. (1997): “Paml: a program package for phylogenetic analysis by maximum likelihood,” Comput. Appl. BioSci., 13, 555–556.Suche in Google Scholar
©2013 by Walter de Gruyter Berlin Boston
Artikel in diesem Heft
- Masthead
- Masthead
- Review
- Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis
- Research Articles
- Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies
- An extension of the Wilcoxon-Mann-Whitney test for analyzing RT-qPCR data
- Block-diagonal discriminant analysis and its bias-corrected rules
- Statistical issues associated with modeling of synonymous mutation data
- Sensitivity to prior specification in Bayesian genome-based prediction models
- Bayesian hierarchical graph-structured model for pathway analysis using gene expression data
Artikel in diesem Heft
- Masthead
- Masthead
- Review
- Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis
- Research Articles
- Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies
- An extension of the Wilcoxon-Mann-Whitney test for analyzing RT-qPCR data
- Block-diagonal discriminant analysis and its bias-corrected rules
- Statistical issues associated with modeling of synonymous mutation data
- Sensitivity to prior specification in Bayesian genome-based prediction models
- Bayesian hierarchical graph-structured model for pathway analysis using gene expression data