Startseite A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type

  • Lajmi Lakhal-Chaieb , Celia M.T. Greenwood , Mohamed Ouhourane , Kaiqiong Zhao , Belkacem Abdous und Karim Oualkacha EMAIL logo
Veröffentlicht/Copyright: 23. Oktober 2017

Abstract

We consider the assessment of DNA methylation profiles for sequencing-derived data from a single cell type or from cell lines. We derive a kernel smoothed EM-algorithm, capable of analyzing an entire chromosome at once, and to simultaneously correct for experimental errors arising from either the pre-treatment steps or from the sequencing stage and to take into account spatial correlations between DNA methylation profiles at neighbouring CpG sites. The outcomes of our algorithm are then used to (i) call the true methylation status at each CpG site, (ii) provide accurate smoothed estimates of DNA methylation levels, and (iii) detect differentially methylated regions. Simulations show that the proposed methodology outperforms existing analysis methods that either ignore the correlation between DNA methylation profiles at neighbouring CpG sites or do not correct for errors. The use of the proposed inference procedure is illustrated through the analysis of a publicly available data set from a cell line of induced pluripotent H9 human embryonic stem cells and also a data set where methylation measures were obtained for a small genomic region in three different immune cell types separated from whole blood.

Acknowledgement

The postdoctoral salary of Mohamed Ouhourane was gratefully supported by the Natural Sciences and Engineering Research Council of Canada through individual discovery research grants to Belkacem Abdous and Karim Oualkacha and by the Fonds de recherche du Québec – Santé through individual grant # 31110 to Karim Oualkacha. Thanks also to the Ludmer Centre for Neuroinformatics and Mental Health.

Appendix

The complete data is Δc={(Xi,Yi,Li,Si),i=1,,N} and the corresponding log–likelihood function is

lc=i=1N(1si){yilog(p0)+(xiyi)log(1p0)}+i=1Nsi{yilog(p1)+(xiyi)log(1p1)}+i=1N{silog[1π(li)]+(1si)log[π(li)]}.

We begin by estimating π(l) as a piecewise constant function. Without loss of generality, let [0, L) be the set of the entire genomic positions and consider the partition 0=L0<L1<<LC=L. Let π(l)=c=1CγcI(Lc1l<Lc). The parameters of the model become p0, p1, γ1,,γC. We propose to estimate these parameters using an EM-algorithm. The E-step and M-step of this iterative algorithm are :

  • E-step: Compute

    θ^i1=E[Si|Xi=xi,Yi=yi,p^0,p^1,π^(l)]=[1π^(li)]g(yi|xi,p^1)π^(li)g(yi|xi,p^0)+[1π^(li)]g(yi|xi,p^1)θ^i0=1θ^i1,
  • M-step: Compute

    (7)p^0=i=1Nθ^i0yii=1Nθ^i0xip^1=i=1Nθ^i1yii=1Nθ^i1xiγ^c=i=1Nθ^i0I(Lc1li<Lc)i=1NI(Lc1li<Lc)π^(l)=c=1Ci=1Nθ^i0I(Lc1li<Lc)i=1NI(Lc1li<Lc)I(Lc1l<Lc)

Note that (7) can be written as

π^(l)=i=1Nθ^i0wii=1Nwi,

with wi=c=1CI(Lc1l,li<Lc) is the number of CpG sites that are in the same interval than l. This is a Nadaraya-Watson estimator with the kernel function K(l,li)=c=1CI(Lc1l,li<Lc), which is equal to one if l and li are in the same interval and 0 otherwise. A generalisation of this estimator to an arbitrary kernel function is given by

wi=i=1NK(llih),

where K is a kernel function and h a bandwidth that controls the smoothness of the estimators.

Furthermore, one can also generalise (7) using local polynomial techniques, which yield π^(l)=ea^0/(1+ea^0), where a^0 is the first element of (a^0,a^1,,a^m) and this set is obtained by minimizing

i=1NK(lilh){θ^i0ea0+a1(lil)++am(lil)m1+ea0+a1(lil)++am(lil)m}2.

Here, m is the order of the polynomial we use to locally approximate log{π(l)/[1π(l)]}.

References

Adsumalli, S., M. F. M. Omar, R. Soong and T. Benoukrafe (2014): “Methodological aspects of whole-genome bisulfite sequencing analysis,” Brief. Bioinform., 16, 369–379.10.1093/bib/bbu016Suche in Google Scholar PubMed

Akman, K., T. Haaf, S. Gravina, J. Vijg and A. Tresch (2014): “Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data,” Bioinformatics, 30, 1933–1934.10.1093/bioinformatics/btu142Suche in Google Scholar PubMed PubMed Central

Allum, F., X. Shao, F. Guénard, M.-M. Simon, S. Busche, M. Caron, J. Lambourne, J. Lessard, K. Tandre, A. K. Hedman, T. Kwan, B. Ge, L. Ronnblom, M. I. McCarthy, P. Deloukas, T. Richmond, D. Burgess, T. D. Spector, A. Tchernof, S. Marceau, M. Lathrop, M.-C. Vohl, T. Pastinen and E. Grundberg (2015): “Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants,” Nat. Commun., 6, Article number: 7211.Suche in Google Scholar

Basford, K. E. and G. J. McLachlan (1985): “Estimation of allocation rates in a cluster analysis context,” J. Am. Stat. Assoc., 80, 286–293.10.1080/01621459.1985.10478110Suche in Google Scholar

Cheng, L. and Y. Zhu (2014): “A classification approach for DNA methylation profiling with bisulfite next-generation sequencing data,” Bioinformatics, 30, 172–179.10.1093/bioinformatics/btt674Suche in Google Scholar PubMed

Clark, S. J., H. J. Lee, S. A. Smallwood, G. Kelsey and W. Reik (2016): “Single-cell epigenomics: powerful new methods for understanding gene regulation and cell identity,” Genome Biol., 17, 72.10.1186/s13059-016-0944-xSuche in Google Scholar PubMed PubMed Central

Cokus, S. J., S. Feng, X. Zhang, Z. Chen, B. Merriman, C. D. Haudenschild, S. Pradhan, S. F. Nelson, M. Pellegrini and S. E. Jacobsen (2008): “Shotgun bisulfite sequencing of the Arabidopsis genome reveals DNA patterning,” Nature, 452, 215–219.10.1038/nature06745Suche in Google Scholar PubMed PubMed Central

DePristo, M. A., E. Banks, R. E. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A. Philippakis, G. del Angel, M. A. M. Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M. Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel, D. Altshuler and M. J. Daly (2011): “A framework for variation discovery and genotyping using next-generation DNA sequencing data,” Nat. Genet., 43, 491–498.10.1038/ng.806Suche in Google Scholar PubMed PubMed Central

Doksum, K., D. Peterson and A. Samarov (2000): “On variable bandwidth selection in local polynomial regression,” Journal of the Royal Statistical Society. Series B, 62, 431–448.10.1111/1467-9868.00242Suche in Google Scholar

Eckhardt, F., J. Lewin, R. Cortese, V. K. Rakyan, J. Attwood, M. Burger, J. Burton, T. V. Cox, R. Davies, T. A. Down, C. Haefliger, R. Horton, K. Howe, D. K. Jackson and J. Kunde (2006): “DNA methylation profiling of human chromosomes 6, 20 and 22,” Nat. Genet., 38, 1378–1385.10.1038/ng1909Suche in Google Scholar PubMed PubMed Central

Farlik, M., N. C. Sheffield, A. Nuzzo, P. Datlinger, A. Schonegger, J. Klughammer and C. Bock (2015): “Single-Cell DNA methylome sequencing and bioinformatic inference of epigenomic cell-state dynamics,” Cell Reports, 10, 1386–1397.10.1016/j.celrep.2015.02.001Suche in Google Scholar PubMed PubMed Central

Feinberg, A. P., M. A. Koldobskiy and A. Gondor (2016): “Epigenetic modulators, modifiers and mediators in cancer aetiology and progression,” Nat. Genet. Rev., 17, 284–299.10.1038/nrg.2016.13Suche in Google Scholar PubMed PubMed Central

Fogel, O., C. Richard-Miceli and J. Tost (2017): “Epigenetic changes in chronic inflammatory diseases,” Adv. Protein Chem. Struct. Biol. 106, 139–189.10.1016/bs.apcsb.2016.09.003Suche in Google Scholar PubMed

Fouse, S. D., R. P. Nagarajan and J. F. Costello (2010): “Genome-scale DNA methylation analysis,” Epigenomics, 2, 105–117.10.2217/epi.09.35Suche in Google Scholar PubMed PubMed Central

Gu, H., Z. D. Smith, C. Bock, P. Boyle, A. Gnirke and A. Meissner (2011): “Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling,” Nat. Protoc., 6, 468–481.10.1038/nprot.2010.190Suche in Google Scholar PubMed

Hansen, K. D., B. Langmead and R. A. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.10.1186/gb-2012-13-10-r83Suche in Google Scholar PubMed PubMed Central

Hebestreit, K., M. Dugas and H. U. Klein (2013): “Detection of significantly differentially methylated regions in targeted bisulfite sequencing data,” Bioinformatics, 29, 1647–1653.10.1093/bioinformatics/btt263Suche in Google Scholar PubMed

Hertz, J., G. Schell and W. Doerfler (1999): “Factors affecting de novo methylation of foreign DNA in mouse embryonic stem cells,” J. Biol. Chem., 274, 24232–24240.10.1074/jbc.274.34.24232Suche in Google Scholar PubMed

Horvath, S. (2013): “DNA methylation age of human tissues and cell types,” Genome Biol., 14, R115.10.1186/gb-2013-14-10-r115Suche in Google Scholar PubMed PubMed Central

Hou, Y., H. Guo, C. Cao, X. Li, B. Hu, P. Zhu, X. Wu, L. Wen, F. Tang, Y. Huang and J. Peng (2016): “Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas,” Cell Res., 26, 304–319.10.1038/cr.2016.23Suche in Google Scholar PubMed PubMed Central

Houseman, E. A., W. P. Accomando, D. C. Koestler, B. C. Christensen, C. J. Marsit, H. H. Nelson, J. K. Wiencke and K. T. Kelsey (2012): “DNA methylation arrays as surrogate measures of cell mixture distribution,” BMC Bioinform., 13, 86.10.1186/1471-2105-13-86Suche in Google Scholar PubMed PubMed Central

Jirtle, R. L. and M. K. Skinner (2007): “Environmental epigenomics and disease susceptibility,” Nat. Rev. Genetics, 8, 253–262.10.1038/nrg2045Suche in Google Scholar PubMed PubMed Central

Prochenka, A., P. Pakarowski, P. Gasperowicz, J. Kosinska, P. Stawinski, R. Zbiec-Piekarska, M. Spolnicka, W. Branicki and R. Ploski (2015): “A cautionary note on using binary calls for analysis of DNA methylation,” Bioinformatics, 31, 1519–1520.10.1093/bioinformatics/btv090Suche in Google Scholar PubMed

Lacey, M. R., C. Baribault and M. Ehrlich (2013): “Modeling, simulation and analysis of methylation profiles from reduced representation bisulfite sequencing experiments,” Stat. Appli. Genet. Mol. Biol., 12, 723–742.10.1515/sagmb-2013-0027Suche in Google Scholar PubMed

Laird, P. W. (2010): “Principles and challenges of genomewide DNA methylation analysis,” Nat. Rev. Genet., 11, 191–203.10.1038/nrg2732Suche in Google Scholar PubMed

Labonté B. and G. Turecki (2013): “Impact of the early-life environment on the epigenome and behavioral development.” In: A. K. Naumova and C. M. T. Greenwood, eds., Epigenetics and complex traits, Springer, New York. PP. 179–208.10.1007/978-1-4614-8078-5_8Suche in Google Scholar

Lin P., S. Forêt, S. R. Wilson and C. J. Burden (2015): “Estimation of the methylation pattern distribution from deep sequencing data,” BMC Bioinform., 16, 145.10.1186/s12859-015-0600-6Suche in Google Scholar PubMed PubMed Central

Lister, R., M. Pelizzola, Y. S. Kida, R. D. Hawkins, J. R. Nery, G. Hon, J. Antosiewicz-Bourget, R. O’Malley, R. Castanon, S. Klugman, M. Downes, R. Yu, R. Stewart, B. Ren, J. A. Thomson, R. M. Evans and J. R. Ecker (2011): “Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells,” Nature, 471, 68–73.10.1038/nature09798Suche in Google Scholar PubMed PubMed Central

Macaulay, I. C. and T. Voet (2014): “Single cell genomics: advances and future perspectives,” PLoS Genet., 10, e1004126.10.1371/journal.pgen.1004126Suche in Google Scholar PubMed PubMed Central

McGregor, K., B. Bernatsky, I. Colmegna, M. Hudson, T. Pastinen, A. Labbe and C. M. T. Greenwood (2016): “An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies,” Genome Biol., 17, 84.10.1186/s13059-016-0935-ySuche in Google Scholar PubMed PubMed Central

Ramachandran, P. and T. J. Perkins (2013): “Adaptive bandwidth kernel density estimation for next-generation sequencing data,” BMC Proc., 7(Suppl. 7), S7.10.1186/1753-6561-7-S7-S7Suche in Google Scholar PubMed PubMed Central

Sandberg, R. (2014): “Entering the era of single-cell transcriptomics in biology and medicine,” Nat. Methods, 11, 22–24.10.1038/nmeth.2764Suche in Google Scholar PubMed

Sandovici, I. (2013): “Establishment of tissue-specific epigenetic states during development,” In: A. K. Naumova and C. M. T. Greenwood, eds., Epigenetic and Complex Traits, Springer, New York, pp. 35–62.10.1007/978-1-4614-8078-5_2Suche in Google Scholar

Schenkel, L. C., D. Rodenhiser, V. Siu, E. McCready, P. Ainsworth and B. Sadikovic (2017): “Constitutional epi/genetic conditions: genetic, epigenetic, and environmental factors,” J. Pediatr. Genet. 6, 30–41.10.1055/s-0036-1593849Suche in Google Scholar PubMed PubMed Central

Smallwood, S. A., H. J. Lee, C. Angermueller, F. Krueger, H. Saadeh, J. Peat, S. R. Andrews, O. Stegle, W. Reik and G. Kelsey (2014): “Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity,” Nat. Methods 11, 817–820.10.1038/nmeth.3035Suche in Google Scholar PubMed PubMed Central

Wreczycka, K., A. Gosdschan, D. Yusuf, B. Gruening, Y. Assenov and A. Akalin (2017): “Strategies for analyzing bisulfite sequencing data,” Journal of Biotechnology, 261, 105–115.10.1016/j.jbiotec.2017.08.007Suche in Google Scholar PubMed

Yong, W.-S., F.-M. Hsu and P.-Y. Chen (2016): “Profiling genome-wide DNA methylation,” Epigenet. Chromatin 9, 26.10.1186/s13072-016-0075-3Suche in Google Scholar PubMed PubMed Central

Zhang, W., T. D. Spector, P. Deloukas, J. T. Bell and B. E. Engelhardt (2015): “Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements,” Genome Biol., 16, 14.10.1186/s13059-015-0581-9Suche in Google Scholar PubMed PubMed Central


Supplemental Material:

The online version of this article offers supplementary material (https://doi.org/10.1515/sagmb-2016-0062).


Published Online: 2017-10-23
Published in Print: 2017-11-27

©2017 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 16.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2016-0062/pdf
Button zum nach oben scrollen