Home A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type
Article
Licensed
Unlicensed Requires Authentication

A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type

  • Lajmi Lakhal-Chaieb , Celia M.T. Greenwood , Mohamed Ouhourane , Kaiqiong Zhao , Belkacem Abdous and Karim Oualkacha EMAIL logo
Published/Copyright: October 23, 2017

Abstract

We consider the assessment of DNA methylation profiles for sequencing-derived data from a single cell type or from cell lines. We derive a kernel smoothed EM-algorithm, capable of analyzing an entire chromosome at once, and to simultaneously correct for experimental errors arising from either the pre-treatment steps or from the sequencing stage and to take into account spatial correlations between DNA methylation profiles at neighbouring CpG sites. The outcomes of our algorithm are then used to (i) call the true methylation status at each CpG site, (ii) provide accurate smoothed estimates of DNA methylation levels, and (iii) detect differentially methylated regions. Simulations show that the proposed methodology outperforms existing analysis methods that either ignore the correlation between DNA methylation profiles at neighbouring CpG sites or do not correct for errors. The use of the proposed inference procedure is illustrated through the analysis of a publicly available data set from a cell line of induced pluripotent H9 human embryonic stem cells and also a data set where methylation measures were obtained for a small genomic region in three different immune cell types separated from whole blood.

Acknowledgement

The postdoctoral salary of Mohamed Ouhourane was gratefully supported by the Natural Sciences and Engineering Research Council of Canada through individual discovery research grants to Belkacem Abdous and Karim Oualkacha and by the Fonds de recherche du Québec – Santé through individual grant # 31110 to Karim Oualkacha. Thanks also to the Ludmer Centre for Neuroinformatics and Mental Health.

Appendix

The complete data is Δc={(Xi,Yi,Li,Si),i=1,,N} and the corresponding log–likelihood function is

lc=i=1N(1si){yilog(p0)+(xiyi)log(1p0)}+i=1Nsi{yilog(p1)+(xiyi)log(1p1)}+i=1N{silog[1π(li)]+(1si)log[π(li)]}.

We begin by estimating π(l) as a piecewise constant function. Without loss of generality, let [0, L) be the set of the entire genomic positions and consider the partition 0=L0<L1<<LC=L. Let π(l)=c=1CγcI(Lc1l<Lc). The parameters of the model become p0, p1, γ1,,γC. We propose to estimate these parameters using an EM-algorithm. The E-step and M-step of this iterative algorithm are :

  • E-step: Compute

    θ^i1=E[Si|Xi=xi,Yi=yi,p^0,p^1,π^(l)]=[1π^(li)]g(yi|xi,p^1)π^(li)g(yi|xi,p^0)+[1π^(li)]g(yi|xi,p^1)θ^i0=1θ^i1,
  • M-step: Compute

    (7)p^0=i=1Nθ^i0yii=1Nθ^i0xip^1=i=1Nθ^i1yii=1Nθ^i1xiγ^c=i=1Nθ^i0I(Lc1li<Lc)i=1NI(Lc1li<Lc)π^(l)=c=1Ci=1Nθ^i0I(Lc1li<Lc)i=1NI(Lc1li<Lc)I(Lc1l<Lc)

Note that (7) can be written as

π^(l)=i=1Nθ^i0wii=1Nwi,

with wi=c=1CI(Lc1l,li<Lc) is the number of CpG sites that are in the same interval than l. This is a Nadaraya-Watson estimator with the kernel function K(l,li)=c=1CI(Lc1l,li<Lc), which is equal to one if l and li are in the same interval and 0 otherwise. A generalisation of this estimator to an arbitrary kernel function is given by

wi=i=1NK(llih),

where K is a kernel function and h a bandwidth that controls the smoothness of the estimators.

Furthermore, one can also generalise (7) using local polynomial techniques, which yield π^(l)=ea^0/(1+ea^0), where a^0 is the first element of (a^0,a^1,,a^m) and this set is obtained by minimizing

i=1NK(lilh){θ^i0ea0+a1(lil)++am(lil)m1+ea0+a1(lil)++am(lil)m}2.

Here, m is the order of the polynomial we use to locally approximate log{π(l)/[1π(l)]}.

References

Adsumalli, S., M. F. M. Omar, R. Soong and T. Benoukrafe (2014): “Methodological aspects of whole-genome bisulfite sequencing analysis,” Brief. Bioinform., 16, 369–379.10.1093/bib/bbu016Search in Google Scholar PubMed

Akman, K., T. Haaf, S. Gravina, J. Vijg and A. Tresch (2014): “Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data,” Bioinformatics, 30, 1933–1934.10.1093/bioinformatics/btu142Search in Google Scholar PubMed PubMed Central

Allum, F., X. Shao, F. Guénard, M.-M. Simon, S. Busche, M. Caron, J. Lambourne, J. Lessard, K. Tandre, A. K. Hedman, T. Kwan, B. Ge, L. Ronnblom, M. I. McCarthy, P. Deloukas, T. Richmond, D. Burgess, T. D. Spector, A. Tchernof, S. Marceau, M. Lathrop, M.-C. Vohl, T. Pastinen and E. Grundberg (2015): “Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants,” Nat. Commun., 6, Article number: 7211.Search in Google Scholar

Basford, K. E. and G. J. McLachlan (1985): “Estimation of allocation rates in a cluster analysis context,” J. Am. Stat. Assoc., 80, 286–293.10.1080/01621459.1985.10478110Search in Google Scholar

Cheng, L. and Y. Zhu (2014): “A classification approach for DNA methylation profiling with bisulfite next-generation sequencing data,” Bioinformatics, 30, 172–179.10.1093/bioinformatics/btt674Search in Google Scholar PubMed

Clark, S. J., H. J. Lee, S. A. Smallwood, G. Kelsey and W. Reik (2016): “Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity,” Genome Biol., 17, 72.10.1186/s13059-016-0944-xSearch in Google Scholar PubMed PubMed Central

Cokus, S. J., S. Feng, X. Zhang, Z. Chen, B. Merriman, C. D. Haudenschild, S. Pradhan, S. F. Nelson, M. Pellegrini and S. E. Jacobsen (2008): “Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA patterning,” Nature, 452, 215–219.10.1038/nature06745Search in Google Scholar PubMed PubMed Central

DePristo, M. A., E. Banks, R. E. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A. Philippakis, G. del Angel, M. A. M. Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M. Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel, D. Altshuler and M. J. Daly (2011): “A framework for variation discovery and genotyping using next-generation DNA sequencing data,” Nat. Genet., 43, 491–498.10.1038/ng.806Search in Google Scholar PubMed PubMed Central

Doksum, K., D. Peterson and A. Samarov (2000): “On variable bandwidth selection in local polynomial regression,” Journal of the Royal Statistical Society. Series B, 62, 431–448.10.1111/1467-9868.00242Search in Google Scholar

Eckhardt, F., J. Lewin, R. Cortese, V. K. Rakyan, J. Attwood, M. Burger, J. Burton, T. V. Cox, R. Davies, T. A. Down, C. Haefliger, R. Horton, K. Howe, D. K. Jackson and J. Kunde (2006): “DNA methylation profiling of human chromosomes 6, 20 and 22,” Nat. Genet., 38, 1378–1385.10.1038/ng1909Search in Google Scholar PubMed PubMed Central

Farlik, M., N. C. Sheffield, A. Nuzzo, P. Datlinger, A. Schonegger, J. Klughammer and C. Bock (2015): “Single-Cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics,” Cell Reports, 10, 1386–1397.10.1016/j.celrep.2015.02.001Search in Google Scholar PubMed PubMed Central

Feinberg, A. P., M. A. Koldobskiy and A. Gondor (2016): “Epigenetic modulators, modifiers and mediators in cancer aetiology and progression,” Nat. Genet. Rev., 17, 284–299.10.1038/nrg.2016.13Search in Google Scholar PubMed PubMed Central

Fogel, O., C. Richard-Miceli and J. Tost (2017): “Epigenetic changes in chronic inflammatory diseases,” Adv. Protein Chem. Struct. Biol. 106, 139–189.10.1016/bs.apcsb.2016.09.003Search in Google Scholar PubMed

Fouse, S. D., R. P. Nagarajan and J. F. Costello (2010): “Genome-scale DNA methylation analysis,” Epigenomics, 2, 105–117.10.2217/epi.09.35Search in Google Scholar PubMed PubMed Central

Gu, H., Z. D. Smith, C. Bock, P. Boyle, A. Gnirke and A. Meissner (2011): “Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling,” Nat. Protoc., 6, 468–481.10.1038/nprot.2010.190Search in Google Scholar PubMed

Hansen, K. D., B. Langmead and R. A. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.10.1186/gb-2012-13-10-r83Search in Google Scholar PubMed PubMed Central

Hebestreit, K., M. Dugas and H. U. Klein (2013): “Detection of significantly differentially methylated regions in targeted bisulfite sequencing data,” Bioinformatics, 29, 1647–1653.10.1093/bioinformatics/btt263Search in Google Scholar PubMed

Hertz, J., G. Schell and W. Doerfler (1999): “Factors affecting de novo methylation of foreign DNA in mouse embryonic stem cells,” J. Biol. Chem., 274, 24232–24240.10.1074/jbc.274.34.24232Search in Google Scholar PubMed

Horvath, S. (2013): “DNA methylation age of human tissues and cell types,” Genome Biol., 14, R115.10.1186/gb-2013-14-10-r115Search in Google Scholar PubMed PubMed Central

Hou, Y., H. Guo, C. Cao, X. Li, B. Hu, P. Zhu, X. Wu, L. Wen, F. Tang, Y. Huang and J. Peng (2016): “Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas,” Cell Res., 26, 304–319.10.1038/cr.2016.23Search in Google Scholar PubMed PubMed Central

Houseman, E. A., W. P. Accomando, D. C. Koestler, B. C. Christensen, C. J. Marsit, H. H. Nelson, J. K. Wiencke and K. T. Kelsey (2012): “DNA methylation arrays as surrogate measures of cell mixture distribution,” BMC Bioinform., 13, 86.10.1186/1471-2105-13-86Search in Google Scholar PubMed PubMed Central

Jirtle, R. L. and M. K. Skinner (2007): “Environmental epigenomics and disease susceptibility,” Nat. Rev. Genetics, 8, 253–262.10.1038/nrg2045Search in Google Scholar PubMed PubMed Central

Prochenka, A., P. Pakarowski, P. Gasperowicz, J. Kosinska, P. Stawinski, R. Zbiec-Piekarska, M. Spolnicka, W. Branicki and R. Ploski (2015): “A cautionary note on using binary calls for analysis of DNA methylation,” Bioinformatics, 31, 1519–1520.10.1093/bioinformatics/btv090Search in Google Scholar PubMed

Lacey, M. R., C. Baribault and M. Ehrlich (2013): “Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments,” Stat. Appli. Genet. Mol. Biol., 12, 723–742.10.1515/sagmb-2013-0027Search in Google Scholar PubMed

Laird, P. W. (2010): “Principles and challenges of genomewide DNA methylation analysis,” Nat. Rev. Genet., 11, 191–203.10.1038/nrg2732Search in Google Scholar PubMed

Labonté B. and G. Turecki (2013): “Impact of the early-life environment on the epigenome and behavioral development.” In: A. K. Naumova and C. M. T. Greenwood, eds., Epigenetics and complex traits, Springer, New York. PP. 179–208.10.1007/978-1-4614-8078-5_8Search in Google Scholar

Lin P., S. Forêt, S. R. Wilson and C. J. Burden (2015): “Estimation of the methylation pattern distribution from deep sequencing data,” BMC Bioinform., 16, 145.10.1186/s12859-015-0600-6Search in Google Scholar PubMed PubMed Central

Lister, R., M. Pelizzola, Y. S. Kida, R. D. Hawkins, J. R. Nery, G. Hon, J. Antosiewicz-Bourget, R. O’Malley, R. Castanon, S. Klugman, M. Downes, R. Yu, R. Stewart, B. Ren, J. A. Thomson, R. M. Evans and J. R. Ecker (2011): “Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells,” Nature, 471, 68–73.10.1038/nature09798Search in Google Scholar PubMed PubMed Central

Macaulay, I. C. and T. Voet (2014): “Single cell genomics: advances and future perspectives,” PLoS Genet., 10, e1004126.10.1371/journal.pgen.1004126Search in Google Scholar PubMed PubMed Central

McGregor, K., B. Bernatsky, I. Colmegna, M. Hudson, T. Pastinen, A. Labbe and C. M. T. Greenwood (2016): “An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies,” Genome Biol., 17, 84.10.1186/s13059-016-0935-ySearch in Google Scholar PubMed PubMed Central

Ramachandran, P. and T. J. Perkins (2013): “Adaptive bandwidth kernel density estimation for next-generation sequencing data,” BMC Proc., 7(Suppl. 7), S7.10.1186/1753-6561-7-S7-S7Search in Google Scholar PubMed PubMed Central

Sandberg, R. (2014): “Entering the era of single-cell transcriptomics in biology and medicine,” Nat. Methods, 11, 22–24.10.1038/nmeth.2764Search in Google Scholar PubMed

Sandovici, I. (2013): “Establishment of tissue-specific epigenetic states during development,” In: A. K. Naumova and C. M. T. Greenwood, eds., Epigenetic and Complex Traits, Springer, New York, pp. 35–62.10.1007/978-1-4614-8078-5_2Search in Google Scholar

Schenkel, L. C., D. Rodenhiser, V. Siu, E. McCready, P. Ainsworth and B. Sadikovic (2017): “Constitutional epi/genetic conditions: genetic, epigenetic, and environmental factors,” J. Pediatr. Genet. 6, 30–41.10.1055/s-0036-1593849Search in Google Scholar PubMed PubMed Central

Smallwood, S. A., H. J. Lee, C. Angermueller, F. Krueger, H. Saadeh, J. Peat, S. R. Andrews, O. Stegle, W. Reik and G. Kelsey (2014): “Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity,” Nat. Methods 11, 817–820.10.1038/nmeth.3035Search in Google Scholar PubMed PubMed Central

Wreczycka, K., A. Gosdschan, D. Yusuf, B. Gruening, Y. Assenov and A. Akalin (2017): “Strategies for analyzing bisulfite sequencing data,” Journal of Biotechnology, 261, 105–115.10.1016/j.jbiotec.2017.08.007Search in Google Scholar PubMed

Yong, W.-S., F.-M. Hsu and P.-Y. Chen (2016): “Profiling genome-wide DNA methylation,” Epigenet. Chromatin 9, 26.10.1186/s13072-016-0075-3Search in Google Scholar PubMed PubMed Central

Zhang, W., T. D. Spector, P. Deloukas, J. T. Bell and B. E. Engelhardt (2015): “Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements,” Genome Biol., 16, 14.10.1186/s13059-015-0581-9Search in Google Scholar PubMed PubMed Central


Supplemental Material:

The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2016-0062).


Published Online: 2017-10-23
Published in Print: 2017-11-27

©2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 17.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2016-0062/html
Scroll to top button