Abstract
We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.
Acknowledgments
This work is supported by Dr. Shuying Sun’s start-up funds and the Research Enhancement Program provided by Texas State University. We are very grateful for three anonymous reviewers’ comments and suggestions, which help us improve this manuscript greatly.
References
Akalin, A., M. Kormaksson, S. Li, F. E. Garrett-Bakelman, M. E. Figueroa, A. Melnick and C. E. Mason (2012): “methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles,” Genome Biol., 13, R87.Suche in Google Scholar
Akman, K., T. Haaf, S. Gravina, J. Vijg and A. Tresch (2014): “Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data,” Bioinformatics, 30, 1933–1934.10.1093/bioinformatics/btu142Suche in Google Scholar
Aryee, M. J., A. E. Jaffe, H. Corrada-Bravo, C. Ladd-Acosta, A. P. Feinberg, K. D. Hansen and R. A. Irizarry (2014): “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays,” Bioinformatics, 30, 1363–1369.10.1093/bioinformatics/btu049Suche in Google Scholar
Baylin, S. and T. H. Bestor (2002): “Altered methylation patterns in cancer cell genomes: Cause or consequence?,” Cancer Cell, 1, 299–305.10.1016/S1535-6108(02)00061-2Suche in Google Scholar
Becker, C., J. Hagmann, J. Muller, D. Koenig, O. Stegle, K. Borgwardt and D. Weigel (2011): “Spontaneous epigenetic variation in the Arabidopsis thaliana methylome,” Nature, 480, 245–249.10.1038/nature10555Suche in Google Scholar PubMed
Benjamini, Y. and R. Heller (2007): “False discovery rates for spatial signals,” J. Am. Stat. Assoc., 102, 1272–1281.Suche in Google Scholar
Benjamini, Y. and Y. Hochberg (1997): “Multiple hypotheses testing with weights,” Scand. J. Stat., 24, 407–418.Suche in Google Scholar
Benjamini, Y., A. M. Krieger and D. Yekutieli (2006): “Adaptive linear step-up procedures that control the false discovery rate,” Biometrika, 93, 491–507.10.1093/biomet/93.3.491Suche in Google Scholar
Bock, C. (2012): “Analysing and interpreting DNA methylation data,” Anglais, 13, 705–719.10.1038/nrg3273Suche in Google Scholar PubMed
Butcher, L. M. and S. Beck (2015): “Probe Lasso: A novel method to rope in differentially methylated regions with 450K DNA methylation data,” Methods (San Diego, Calif.), 72, 21–28.Suche in Google Scholar
Challen, G. A., D. Sun, M. Jeong, M. Luo, J. Jelinek, J. S. Berg, C. Bock, A. Vasanthakumar, H. Gu, Y. Xi, S. Liang, Y. Lu, G. J. Darlington, A. Meissner, J.-P. J. Issa, L. A. Godley, W. Li and M. A. Goodell (2011): “Dnmt3a is essential for hematopoietic stem cell differentiation,” Nat. Genet., 44, 23–31.Suche in Google Scholar
Dolzhenko, E. and A. D. Smith (2014): “Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments,” BMC Bioinformatics, 15, 215–215.10.1186/1471-2105-15-215Suche in Google Scholar PubMed PubMed Central
Du, P. and R. Bourgon (2014): “methyAnalysis: DNA methylation data analysis and visualization,” R package version 1.10.0.Suche in Google Scholar
Eckhardt, F., J. Lewin, R. Cortese, V. K. Rakyan, J. Attwood, M. Burger, J. Burton, T. V. Cox, R. Davies, T. A. Down, C. Haefliger, R. Horton, K. Howe, D. K. Jackson, J. Kunde, C. Koenig, J. Liddle, D. Niblett, T. Otto, R. Pettett, S. Seemann, C. Thompson, T. West, J. Rogers, A. Olek, K. Berlin and S. Beck (2006): “DNA methylation profiling of human chromosomes 6, 20 and 22,” Nat. Genet., 38, 1378–1385.Suche in Google Scholar
Feng, H., K. N. Conneely and H. Wu (2014): “A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data,” Nucleic Acids Res., 42, e69–e69.Suche in Google Scholar
Gopalakrishnan, S., B. O. Van Emburgh and K. D. Robertson (2008): “DNA methylation in development and human disease,” Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 647, 30–38.10.1016/j.mrfmmm.2008.08.006Suche in Google Scholar PubMed PubMed Central
Gu, H., C. Bock, T. S. Mikkelsen, N. Jager, Z. D. Smith, E. Tomazou, A. Gnirke, E. S. Lander and A. Meissner (2010): “Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution,” Nat. Methods, 7, 133–136.Suche in Google Scholar
Gu, H., Z. D. Smith, C. Bock, P. Boyle, A. Gnirke and A. Meissner (2011): “Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling,” Nat. Protoc., 6, 468–481.Suche in Google Scholar
Guzman, L., M. Depix, A. Salinas, R. Roldan, F. Aguayo, A. Silva and R. Vinet (2012): “Analysis of aberrant methylation on promoter sequences of tumor suppressor genes and total DNA in sputum samples: a promising tool for early detection of COPD and lung cancer in smokers,” Diagn. Pathol., 7, 87.10.1186/1746-1596-7-87Suche in Google Scholar PubMed PubMed Central
Hansen, K., B. Langmead and R. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.Suche in Google Scholar
Hansen, K. D., W. Timp, H. C. Bravo, S. Sabunciyan, B. Langmead, O. G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R. A. Irizarry and A. P. Feinberg (2011): “Increased methylation variation in epigenetic domains across cancer types,” Nat. Genet., 43, 768–775.Suche in Google Scholar
Harris, E. Y., N. Ponts, A. Levchuk, K. L. Roch and S. Lonardi (2010): “BRAT: bisulfite-treated reads analysis tool,” Bioinformatics, 26, 572–573.10.1093/bioinformatics/btp706Suche in Google Scholar PubMed PubMed Central
Hebestreit, K., M. Dugas and H. U. Klein (2013): “Detection of significantly differentially methylated regions in targeted bisulfite sequencing data,” Bioinformatics, 29, 1647–1653.10.1093/bioinformatics/btt263Suche in Google Scholar PubMed
Irizarry, R. A., C. Ladd-Acosta, B. Wen, Z. Wu, C. Montano, P. Onyango, H. Cui, K. Gabo, M. Rongione, M. Webster, H. Ji, J. B. Potash, S. Sabunciyan and A. P. Feinberg (2009): “The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores,” Nat. Genet., 41, 178–186.Suche in Google Scholar
Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209.Suche in Google Scholar
Jayanth, N. and M. Puranik (2011): “Methylation stabilizes the imino tautomer of dAMP and amino tautomer of dCMP in solution,” J. Phys. Chem. B, 115, 6234–6242.Suche in Google Scholar
Jiang, P., K. Sun, F. M. F. Lun, A. M. Guo, H. Wang, K. C. A. Chan, R. W. K. Chiu, Y. M. D. Lo and H. Sun (2014): “Methy-pipe: an integrated bioinformatics pipeline for whole genome bisulfite sequencing data analysis,” PLoS ONE, 9, e100360.10.1371/journal.pone.0100360Suche in Google Scholar PubMed PubMed Central
Law, J. A. and S. E. Jacobsen (2010): “Establishing, maintaining and modifying DNA methylation patterns in plants and animals,” Anglais, 11, 204–220.10.1038/nrg2719Suche in Google Scholar PubMed PubMed Central
Li, S., F. Garrett-Bakelman, A. Akalin, P. Zumbo, R. Levine, B. To, I. Lewis, A. Brown, R. D’Andrea, A. Melnick and C. Mason (2013): “An optimized algorithm for detecting and annotating regional differential methylation,” BMC Bioinformatics, 14, S10.10.1186/1471-2105-14-S5-S10Suche in Google Scholar PubMed PubMed Central
Li, Y., J. Zhu, G. Tian, N. Li, Q. Li, M. Ye, H. Zheng, J. Yu, H. Wu, J. Sun, H. Zhang, Q. Chen, R. Luo, M. Chen, Y. He, X. Jin, Q. Zhang, C. Yu, G. Zhou, J. Sun, Y. Huang, H. Zheng, H. Cao, X. Zhou, S. Guo, X. Hu, X. Li, K. Kristiansen, L. Bolund, J. Xu, W. Wang, H. Yang, J. Wang, R. Li, S. Beck, J. Wang and X. Zhang (2010): “The DNA Methylome of Human Peripheral Blood Mononuclear Cells,” PLoS Biology, 8, e1000533.10.1371/journal.pbio.1000533Suche in Google Scholar PubMed PubMed Central
Lister, R., M. Pelizzola, R. H. Dowen, R. D. Hawkins, G. Hon, J. Tonti-Filippini, J. R. Nery, L. Lee, Z. Ye, Q. M. Ngo, L. Edsall, J. Antosiewicz-Bourget, R. Stewart, V. Ruotti, A. H. Millar, J. A. Thomson, B. Ren and J. R. Ecker (2009): “Human DNA methylomes at base resolution show widespread epigenomic differences,” Nature, 462, 315–322.10.1038/nature08514Suche in Google Scholar PubMed PubMed Central
Park, Y., M. E. Figueroa, L. S. Rozek and M. A. Sartor (2014): “MethylSig: a whole genome DNA methylation analysis pipeline,” Bioinformatics, 30, 2414–2422.10.1093/bioinformatics/btu339Suche in Google Scholar PubMed PubMed Central
Pawitan, Y., S. Michiels, S. Koscielny, A. Gusnanto and A. Ploner (2005): “False discovery rate, sensitivity and sample size for microarray studies,” Bioinformatics, 21, 3017–3024.10.1093/bioinformatics/bti448Suche in Google Scholar PubMed
Peters, T. J., M. J. Buckley, A. L. Statham, R. Pidsley, K. Samaras, R. V Lord, S. J. Clark and P. L. Molloy (2015): “De novo identification of differentially methylated regions in the human genome,” Epigenetics Chromatin, 8, 6.10.1186/1756-8935-8-6Suche in Google Scholar PubMed PubMed Central
Robinson, M. D., A. Kahraman, C. W. Law, H. Lindsay, M. Nowicka, L. M. Weber and X. Zhou (2014): “Statistical methods for detecting differentially methylated loci and regions,” Front. Genet., 5, 324.Suche in Google Scholar
Saito, Y., J. Tsuji and T. Mituyama (2014): “Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions,” Nucleic Acids Res., 42, e45.Suche in Google Scholar
Sofer, T., E. D. Schifano, J. A. Hoppin, L. Hou and A. A. Baccarelli (2013): “A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure,” Bioinformatics, 29, 2884–2891.10.1093/bioinformatics/btt498Suche in Google Scholar PubMed PubMed Central
Song, Q., B. Decato, E. E. Hong, M. Zhou, F. Fang, J. Qu, T. Garvin, M. Kessler, J. Zhou and A. D. Smith (2013): “A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics,” PLoS ONE, 8, e81148.10.1371/journal.pone.0081148Suche in Google Scholar PubMed PubMed Central
Storey, J. D. (2002): “A direct approach to false discovery rates,” J Roy Stat Soc B Met, 64, 479–498.10.1111/1467-9868.00346Suche in Google Scholar
Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci., 100, 9440–9445.Suche in Google Scholar
Strathdee, G. and R. Brown (2002): “Aberrant DNA methylation in cancer: potential clinical interventions,” Expert Rev. Mol. Med., 4, 1–17.Suche in Google Scholar
Su, J., H. Yan, Y. Wei, H. Liu, H. Liu, F. Wang, J. Lv, Q. Wu and Y. Zhang (2013): “CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data,” Nucleic Acids Res., 41, e4–e4.10.1093/nar/gks829Suche in Google Scholar PubMed PubMed Central
Sun, D., Y. Xi, B. Rodriguez, H. Park, P. Tong, M. Meong, M. Goodell and W. Li (2014): “MOABS: model based analysis of bisulfite sequencing data,” Genome Biol., 15, R38.Suche in Google Scholar
Sun, S. and X. Yu (2016a): “HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test,” Stat. Appl. Genet. Mol. Biol., 15, 55–67.10.1515/sagmb-2015-0076Suche in Google Scholar PubMed
Sun, S. and X. Yu (2016b): “HMM-Fisher,” GitHub repository, https://github.com/xxy39/HMM-Fisher.Suche in Google Scholar
Sun, Z., Y. W. Asmann, K. R. Kalari, B. Bot, J. E. Eckel-Passow, T. R. Baker, J. M. Carr, I. Khrebtukova, S. Luo, L. Zhang, G. P. Schroth, E. A. Perez and E. A. Thompson (2011): “Integrated analysis of gene expression, CpG Island methylation, and gene copy number in breast cancer cells by deep sequencing,” PLoS ONE, 6, e17490.10.1371/journal.pone.0017490Suche in Google Scholar PubMed PubMed Central
Suzuki, M. and A. Bird (2008): “DNA methylation landscapes: provocative insights from epigenomics,” Anglais, 9, 465–476.10.1038/nrg2341Suche in Google Scholar PubMed
Wang, D., L. Yan, Q. Hu, L. E. Sucheston, M. J. Higgins, C. B. Ambrosone, C. S. Johnson, D. J. Smiraglia and S. Liu (2012): “IMA: an R package for high-throughput analysis of Illumina’s 450K Infinium methylation data,” Bioinformatics, 28, 729–730.10.1093/bioinformatics/bts013Suche in Google Scholar PubMed PubMed Central
Wang, H., L. Tuominen and C. Tsai (2011): “SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures,” Bioinformatics, 27, 225–231.10.1093/bioinformatics/btq650Suche in Google Scholar PubMed
Wei, S., R. Brown and T. Huang (2003): “Aberrant DNA methylation in ovarian cancer: is there an epigenetic predisposition to drug response?,” Ann. N. Y. Acad Sci., 983, 243–250.Suche in Google Scholar
Xu, H., R. H. Podolsky, D. Ryu, X. Wang, S. Su, H. Shi and V. George (2013): “A method to detect differentially methylated loci with next-generation sequencing,” Genet Epidemiol., 37, 377–382.Suche in Google Scholar
Yu, X. and S. Sun (2016a): “HMM-DM: identifying differentially methylated regions using a hidden Markov model,” Stat. Appl. Genet. Mol. Biol., 15, 69–81.10.1515/sagmb-2015-0077Suche in Google Scholar PubMed
Yu, X. and S. Sun (2016b): “HMM-DM,” GitHub repository, https://github.com/xxy39/HMM-DM.Suche in Google Scholar
Zhang, Y., H. Liu, J. Lv, X. Xiao, J. Zhu, X. Liu, J. Su, X. Li, Q. Wu, F. Wang and Y. Cui (2011): “QDMR: a quantitative method for identification of differentially methylated regions by entropy,” Nucleic Acids Res., 39, e58–e58.Suche in Google Scholar
Supplemental Material:
The online version of this article (DOI: 10.1515/sagmb-2015-0078) offers supplementary material, available to authorized users.
©2016 by De Gruyter
Artikel in diesem Heft
- Frontmatter
- Research Articles
- What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment
- A graph theoretical approach to data fusion
- Resistant multiple sparse canonical correlation
- A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data
- AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies
- Comparing five statistical methods of differential methylation identification using bisulfite sequencing data
Artikel in diesem Heft
- Frontmatter
- Research Articles
- What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment
- A graph theoretical approach to data fusion
- Resistant multiple sparse canonical correlation
- A Markov random field-based approach for joint estimation of differentially expressed genes in mouse transcriptome data
- AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies
- Comparing five statistical methods of differential methylation identification using bisulfite sequencing data