Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling
-
Rawane Samb
, Khader Khadraoui , Pascal Belleau , Astrid Deschênes , Lajmi Lakhal-Chaieb and Arnaud Droit
Abstract
Genome-wide mapping of nucleosomes has revealed a great deal about the relationships between chromatin structure and control of gene expression. Recent next generation CHIP-chip and CHIP-Seq technologies have accelerated our understanding of basic principles of chromatin organization. These technologies have taught us that nucleosomes play a crucial role in gene regulation by allowing physical access to transcription factors. Recent methods and experimental advancements allow the determination of nucleosome positions for a given genome area. However, most of these methods estimate the number of nucleosomes either by an EM algorithm using a BIC criterion or an effective heuristic strategy. Here, we introduce a Bayesian method for identifying nucleosome positions. The proposed model is based on a Multinomial-Dirichlet classification and a hierarchical mixture distributions. The number and the positions of nucleosomes are estimated using a reversible jump Markov chain Monte Carlo simulation technique. We compare the performance of our method on simulated data and MNase-Seq data from Saccharomyces cerevisiae against PING and NOrMAL methods.
Acknowledgments
We thank Frédéric Fournier for assistance with the preparation of the manuscript. We also thank Charles Joly Beauparlant for his advices on algorithm implementation and Fabien Claude Lamaze for assistance with the biological aspect. RJMCMC computations on simulated and biological data were made on the supercomputer Colosse from Université Laval, managed by Calcul Québec and Compute Canada. The operation of this supercomputer is funded by the Canada Foundation for Innovation (CFI), Ministère de l’Économie, de l’Innovation et des Exportations du Québec (MEIE), RMGA and the Fonds de recherche du Québec – Nature et technologies (FRQ-NT).
Funding: Canadian Institutes of Health Research, (Grant/Award number: IC513823).
References
Chen, H. and P. C. Boutros (2011): “Venndiagram: a package for the generation of highly-customizable venn and euler diagrams in r,” BMC Bioinformatics, 12, 35, URL http://www.biomedcentral.com/1471-2105/12/35.Search in Google Scholar
Chen, K., L. Wang, M. Yang, J. Liu, C. Xin, S. Hu and J. Yu (2010): “Sequence signature of nucleosome positioning in caenorhabditis elegans,” Genomics Proteomics Bioinformatics, 8, 92–102.10.1016/S1672-0229(10)60010-1Search in Google Scholar
Deschênes, A., F. C. Lamaze, P. Belleau and A. Droit (2015): consensusSeekeR: Detection of consensus regions inside a group of experiments using genomic positions and genomic ranges, URL https://github.com/ArnaudDroitLab/consensusSeekeR, r package version 0.99.0.Search in Google Scholar
Flores, O. and M. Orozco (2011): “nucleR: A package for non-parametric nucleosome positioning,” Bioinformatics, 27, 2149–2150.10.1093/bioinformatics/btr345Search in Google Scholar PubMed
Ganguli, S., D. Gopal and D. Abhijit (2012): “Using svm for identifying epigenetic patterns in microsatellites in human sex determining genes and its homologues,” J. Pharm. Sci. Res., 4, 1692–1696.Search in Google Scholar
Green, P. (1995): “Reversible jump markov chain monte carlo computation and bayesian model determination,” Biometrika, 82, 711–732.10.1093/biomet/82.4.711Search in Google Scholar
Kuan, P., D. Huebert, A. Gasch and S. Keles (2009): “A non-homogeneous hidden-state model on first order difference for automatic detection of nucleosome positions,” Stat. Appl. Genet. Mol. Biol., 8, Article 29.Search in Google Scholar
Lopez-Serra, L., G. Kelly, H. Patel, A. Stewart and F. Uhlmann (2014): “The scc2-scc4 complex acts in sister chromatid cohesion and transcriptional regulation by maintaining nucleosome-free regions,” Nat. Genet., 46, 1147–1151.Search in Google Scholar
Mendenhall, E. and B. Bernstein (2008): “Dynamic nucleosomes,” Curr. Opin. Genet. Dev., 18, 109–115.Search in Google Scholar
Mitra, R. and M. Gupta (2011): “A continuous-index bayesian hidden markov model for prediction of nucleosome positioning in genomic dna,” Biometrics, 12, 462–477.10.1093/biostatistics/kxq077Search in Google Scholar PubMed PubMed Central
Pepke, S., B. Wold and A. Mortazavi (2009): “Computation for chip-seq and rna-seq studies,” Nat. Methods, 6, 22–32.Search in Google Scholar
Polishko, A., N. Ponts, K. G. Le Roch and S. Lonardi (2012): “Normal: Accurate nucleosome positioning using a modified gaussian mixture model,” Bioinformatics, 28, 242–249.10.1093/bioinformatics/bts206Search in Google Scholar PubMed PubMed Central
Polishko, A., E. M. Bunnik, K. G. Le Roch and S. Lonardi (2014): “Puffin – a parameter-free method to build nucleosome maps from paired-end reads,” BMC Bioinformatics, 15, S11.10.1186/1471-2105-15-S9-S11Search in Google Scholar PubMed PubMed Central
Richardson, S. and P. Green (1997): “On bayesian analysis of mixtures with an unknown number of components,” J. Roy. Stat. Soc. B, 59, 731–792.Search in Google Scholar
Robert, C. and G. Casella (2004): Monte Carlo statistical methods, 2nd ed., New York: Springer-Verlag.10.1007/978-1-4757-4145-2Search in Google Scholar
Robinson, J. T., T. Helga, W. Winckler, M. Guttman, E. S. Lander, G. Getz and J. P. Mesirov (2011): “Integrative genomics viewer,” Nat. Biotechnol., 29, 24–26.Search in Google Scholar
Samb, R., A. Deschênes, P. Belleau and A. Droit (2015): nucleoSim: Generate synthetic nucleosome maps, URL https://github.com/ArnaudDroitLab/nucleoSim, r package version 0.99.0.Search in Google Scholar
Schöpflin, R., V. B. Teif, O. Müller, C. Weinberg, K. Rippe and G. Wedemann (2013): “Modeling nucleosome position distributions from experimental nucleo-some positioning maps,” Bioinformatics, 29, 2380–2386.10.1093/bioinformatics/btt404Search in Google Scholar PubMed
Schwarz, G. (1978): “Estimating the dimension of a model,” Ann. Stat., 6, 461–464.Search in Google Scholar
Tierney, L. (1994): “Markov chains for exploring posterior distributions,” Ann. Stat., 22, 1701–1762.Search in Google Scholar
Xi, L., Y. Fondufe-Mittendorf, L. Xia, J. Flatow, J. Widom and J.-P. Wang (2010): “Predicting nucleosome positioning using a duration hidden markov model,” BMC Bioinformatics, 11, 346.10.1186/1471-2105-11-346Search in Google Scholar PubMed PubMed Central
Zhang, X., G. Robertson, M. Krzywinski, K. Ning, A. Droit, S. Jones and R. Got-tardo (2011): “Pics: Probabilistic inference for chip-seq,” Biometrics, 67, 151–163.10.1111/j.1541-0420.2010.01441.xSearch in Google Scholar PubMed
Zhang, X., G. Robertson, S. Woo, B. G. Hoffman and R. Gottardo (2012): “Probabilistic inference for nucleosome positioning with MNase-based or sonicated short-read data,” PLoS One, 7, e32095.10.1371/journal.pone.0032095Search in Google Scholar PubMed PubMed Central
Supplemental Material:
The online version of this article (DOI: 10.1515/sagmb-2014-0098) offers supplementary material, available to authorized users.
©2015 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- Homology cluster differential expression analysis for interspecies mRNA-Seq experiments
- Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling
- On the validity of within-nuclear-family genetic association analysis in samples of extended families
- An Empirical Bayes risk prediction model using multiple traits for sequencing data
- Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data
Articles in the same Issue
- Frontmatter
- Research Articles
- Homology cluster differential expression analysis for interspecies mRNA-Seq experiments
- Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling
- On the validity of within-nuclear-family genetic association analysis in samples of extended families
- An Empirical Bayes risk prediction model using multiple traits for sequencing data
- Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data