Abstract
The availability of large quantities of transcriptomic data in the form of RNA-seq count data has necessitated the development of methods to identify genes differentially expressed between experimental conditions. Many existing approaches apply a parametric model of gene expression and so place strong assumptions on the distribution of the data. Here we explore an alternate nonparametric approach that applies an empirical likelihood framework, allowing us to define likelihoods without specifying a parametric model of the data. We demonstrate the performance of our method when applied to gold standard datasets, and to existing experimental data. Our approach outperforms or closely matches performance of existing methods in the literature, and requires modest computational resources. An R package, EmpDiff implementing the methods described in the paper is available from: http://homepages.inf.ed.ac.uk/tthorne/software/packages/EmpDiff_0.99.tar.gz.
Acknowledgments
This work was supported by the University of Edinburgh Chancellor’s Fellowship to T.T.
References
Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data,” Genome Biol., 11, R106.Search in Google Scholar
Baggerly, K. A. (1998): “Empirical likelihood as a goodness-of-fit measure,” Biometrika, 85, 535–547.10.1093/biomet/85.3.535Search in Google Scholar
Barrett, T., S. E. Wilhite, P. Ledoux, C. Evangelista, I. F. Kim, M. Tomashevsky, K. A. Marshall, K. H. Phillippy, P. M. Sherman, M. Holko, A. Yefanov, H. Lee, N. Zhang, C. L. Robertson, N. Serova, S. Davis and A. Soboleva (2013): “NCBI GEO: archive for functional genomics data sets– update,” Nucleic Acids Res., 41, D991–5.Search in Google Scholar
Bartolucci, F. (2007): “A penalized version of the empirical likelihood ratio for the population mean,” Stat. Probabil. Lett., 77, 104–110.Search in Google Scholar
Benidt, S. and D. Nettleton (2015): “SimSeq: a nonparametric approach to simulation of RNA-sequence datasets,” Bioinformatics, 31, 2131–2140.10.1093/bioinformatics/btv124Search in Google Scholar PubMed PubMed Central
Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Search in Google Scholar PubMed PubMed Central
Canales, R. D., Y. Luo, J. C. Willey, B. Austermiller, C. C. Barbacioru, C. Boysen, K. Hunkapiller, R. V. Jensen, C. R. Knight, K. Y. Lee, Y. Ma, B. Maqsodi, A. Papallo, E. H. Peters, K. Poulter, P. L. Ruppel, R. R. Samaha, L. Shi, W. Yang, L. Zhang and F. M. Goodsaid (2006): “Evaluation of DNA microarray results with quantitative gene expression platforms,” Nat. Biotechnol., 24, 1115–1122.Search in Google Scholar
Dere, E., R. Lo, T. Celius, J. Matthews and T. R. Zacharewski (2011): “Integration of Genome-Wide Computation DRE Search, AhR ChIP-chip and Gene Expression Analyses of TCDD-Elicited Responses in the Mouse Liver,” BMC Genomics, 12, 365.10.1186/1471-2164-12-365Search in Google Scholar PubMed PubMed Central
Edgar, R., M. Domrachev and A. E. Lash (2002): “Gene Expression Omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Res., 30, 207–210.Search in Google Scholar
Frazee, A. C., B. Langmead and J. T. Leek (2011): “ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets,” BMC Bioinformatics, 12, 449.10.1186/1471-2105-12-449Search in Google Scholar PubMed PubMed Central
Grau, J., I. Grosse and J. Keilwagen (2015): “PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R,” Bioinformatics, 31, 2595–2597.10.1093/bioinformatics/btv153Search in Google Scholar PubMed PubMed Central
Hardcastle, T. J. and K. A. Kelly (2010): “baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data,” BMC Bioinformatics, 11, 422.10.1186/1471-2105-11-422Search in Google Scholar PubMed PubMed Central
Kanehisa, M. and S. Goto (2000): “KEGG: kyoto encyclopedia of genes and genomes,” Nucleic Acids Res., 28, 27–30.Search in Google Scholar
Kanehisa, M., S. Goto, S. Kawashima and A. Nakaya (2002): “The KEGG databases at GenomeNet,” Nucleic Acids Res., 30, 42–46.Search in Google Scholar
Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. G. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments,” Bioinformatics, 29, 1035–1043.10.1093/bioinformatics/btt087Search in Google Scholar PubMed PubMed Central
Li, J. and R. Tibshirani (2013): “Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data,” Stat. Methods Med. Res., 22, 519–536.Search in Google Scholar
Lo, R. and J. Matthews (2012): “High-resolution genome-wide mapping of AHR and ARNT binding sites by ChIP-Seq,” Toxicol. Sci., 130, 349–361.Search in Google Scholar
Lo, R. and J. Matthews (2013): “The aryl hydrocarbon receptor and estrogen receptor alpha differentially modulate nuclear factor erythroid-2-related factor 2 transactivation in MCF-7 breast cancer cells,” Toxicol. Appl. Pharm., 270, 139–148.10.1016/j.taap.2013.03.029Search in Google Scholar PubMed
Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 550.Search in Google Scholar
MAQC Consortium, L. Shi, L. H. Reid, W. D. Jones, R. Shippy, J. A. Warrington, S. C. Baker, P. J. Collins, F. de Longueville, E. S. Kawasaki, K. Y. Lee, Y. Luo, Y. A. Sun, J. C. Willey, R. A. Setterquist, G. M. Fischer, W. Tong, Y. P. Dragan, D. J. Dix, F. W. Frueh, F. M. Goodsaid, D. Herman, R. V. Jensen, C. D. Johnson, E. K. Lobenhofer, R. K. Puri, U. Schrf, J. Thierry-Mieg, C. Wang, M. Wilson, P. K. Wolber, L. Zhang, S. Amur, W. Bao, C. C. Barbacioru, A. B. Lucas, V. Bertholet, C. Boysen, B. Bromley, D. Brown, A. Brunner, R. Canales, X. M. Cao, T. A. Cebula, J. J. Chen, J. Cheng, T.-M. Chu, E. Chudin, J. Corson, J. C. Corton, L. J. Croner, C. Davies, T. S. Davison, G. Delenstarr, X. Deng, D. Dorris, A. C. Eklund, X.-h. Fan, H. Fang, S. Fulmer-Smentek, J. C. Fuscoe, K. Gallagher, W. Ge, L. Guo, X. Guo, J. Hager, P. K. Haje, J. Han, T. Han, H. C. Harbottle, S. C. Harris, E. Hatchwell, C. A. Hauser, S. Hester, H. Hong, P. Hurban, S. A. Jackson, H. Ji, C. R. Knight, W. P. Kuo, J. E. LeClerc, S. Levy, Q.-Z. Li, C. Liu, Y. Liu, M. J. Lombardi, Y. Ma, S. R. Magnuson, B. Maqsodi, T. McDaniel, N. Mei, O. Myklebost, B. Ning, N. Novoradovskaya, M. S. Orr, T. W. Os-born, A. Papallo, T. A. Patterson, R. G. Perkins, E. H. Peters, R. Peterson, K. L. Philips, P. S. Pine, L. Pusztai, F. Qian, H. Ren, M. Rosen, B. A. Rosenzweig, R. R. Samaha, M. Schena, G. P. Schroth, S. Shchegrova, D. D. Smith, F. Staedtler, Z. Su, H. Sun, Z. Szallasi, Z. Tezak, D. Thierry-Mieg, K. L. Thompson, I. Tikhonova, Y. Turpaz, B. Vallanat, C. Van, S. J. Walker, S. J. Wang, Y. Wang, R. Wolfinger, A. Wong, J. Wu, C. Xiao, Q. Xie, J. Xu, W. Yang, L. Zhang, S. Zhong, Y. Zong and W. Slikker (2006): “The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements,” Nat. Biotechnol., 24, 1151–1161.Search in Google Scholar
Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono and M. Kanehisa (1999): “KEGG: Kyoto Encyclopedia of Genes and Genomes,” Nucleic Acids Res., 27, 29–34.Search in Google Scholar
Owen, A. B. (1988): “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, 75, 237–249.10.1093/biomet/75.2.237Search in Google Scholar
Owen, A. B. (2001): Empirical Likelihood, CRC Press, Boca Raton, FL.Search in Google Scholar
Pawitan, Y. (2001): In All likelihood: statistical modelling and inference using likelihood, Oxford University Press, Oxford.Search in Google Scholar
R Core Team (2015): R: A language and environment for statistical computing, R Foundation for statistical computing, Vienna, Austria.Search in Google Scholar
Reimand, J., M. Kull, H. Peterson, J. Hansen and J. Vilo (2007): “g:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments,” Nucleic Acids Res., 35, W193–200.Search in Google Scholar
Reimand, J., T. Arak and J. Vilo (2011): “g:Profiler – a web server for functional interpretation of gene lists (2011 update),” Nucleic Acids Res., 39, W307–W315.Search in Google Scholar
Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central
Salisbury, T. B., J. K. Tomblin, D. A. Primerano, G. Boskovic, J. Fan, I. Mehmi, J. Fletcher, N. Santanam, E. Hurn, G. Z. Morris and J. Denvir (2014): “Endogenous aryl hydrocarbon receptor promotes basal and inducible expression of tumor necrosis factor target genes in MCF-7 cancer cells,” Biochem. Pharmacol., 91, 390–399.10.1016/j.bcp.2014.06.015Search in Google Scholar PubMed PubMed Central
Tarazona, S., F. Garcí-Alcalde, J. Dopazo, A. Ferrer and A. Conesa (2011): “Differential expression in RNA-seq: a matter of depth,” Genome Res., 21, 2213–2223.Search in Google Scholar
The Cancer Genome Atlas Research Network (2013): “Comprehensive molecular characterization of clear cell renal cell carcinoma,” Nature, 499, 43–49.10.1038/nature12222Search in Google Scholar PubMed PubMed Central
Yang, X., S. Solomon, L. R. Fraser, A. F. Trombino, D. Liu, G. E. Sonenshein, E. V. Hestermann and D. H. Sherr (2008): “Constitutive regulation ofCYP1B1 by the aryl hydrocarbon receptor (AhR) in pre-malignant and malignant mammary tissue,” J. Cell. Biochem., 104, 402–417.Search in Google Scholar
Supplemental Material:
The online version of this article (DOI: 10.1515/sagmb-2015-0095) offers supplementary material, available to authorized users.
©2015 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- Homology cluster differential expression analysis for interspecies mRNA-Seq experiments
- Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling
- On the validity of within-nuclear-family genetic association analysis in samples of extended families
- An Empirical Bayes risk prediction model using multiple traits for sequencing data
- Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data
Articles in the same Issue
- Frontmatter
- Research Articles
- Homology cluster differential expression analysis for interspecies mRNA-Seq experiments
- Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling
- On the validity of within-nuclear-family genetic association analysis in samples of extended families
- An Empirical Bayes risk prediction model using multiple traits for sequencing data
- Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data