Abstract
RNA-Seq is a developing technology for generating gene expression data by directly sequencing mRNA molecules in a sample. RNA-Seq data consist of counts of reads recorded to a particular gene that are often used to identify differentially expressed (DE) genes. A common statistical method used to analyze RNA-Seq data is Significance Analysis of Microarray with emphasis on RNA-Seq data (SAMseq). SAMseq is a nonparametric method that uses a resampling technique to account for differences in sequencing depths when identifying DE genes. We propose a modification of this method that takes into account asymmetry in the distribution of the effect sizes by taking into account the sign of the test statistics. Through simulation studies, we showthat the proposed method, comparedwith the traditional SAMseqmethod and other existing methods provides better power for identifying truly DE genes or more sufficiently controls FDR in most settings where asymmetry is present. We illustrate the use of the proposed method by analyzing an RNA-Seq data set containing C57BL/6J (B6) and DBA/2J (D2) mouse strains samples.
References
Audic, S. and J. M. Claverie (1997): “The significance of digital gene expression profiles,” Genome Res., 7, 986–995.10.1101/gr.7.10.986Suche in Google Scholar PubMed
Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B Methodol., 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSuche in Google Scholar
Bennett, S. T., C. Barnes, A. Cox, L. Davies and C. Brown (2005): “Toward the $1000 human genome,” Pharmacogenomics, 6, 373–382.10.1517/14622416.6.4.373Suche in Google Scholar PubMed
Bottomly, D., N. A. R. Walter, J. E. Hunter, P. Darakjian, S. Kawane, K. J. Buck, R. P. Searles, M. Mooney, S. K. McWeeney and R. Hitzemann (2011): “Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays,” PLoS One, 6, e17820.10.1371/journal.pone.0017820Suche in Google Scholar PubMed
Brown, P. O. and D. Botstein (1999): “Exploring the new world of the genome with DNA microarrays,” Nat. Genet., 21(1 Suppl), 33–37.10.1038/4462Suche in Google Scholar PubMed
Bullard, J. H., E. Purdom, K. D. Hansen and S. Dudoit (2010): “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments,” BMC Bioinformatics, 11, 94.10.1186/1471-2105-11-94Suche in Google Scholar PubMed
Chu, Y. and D. R. Corey (2012): “RNA sequencing: platform selection, experimental design, and data interpretation,” Nucleic Acid Ther., 22, 271–274.10.1089/nat.2012.0367Suche in Google Scholar PubMed
DeRisi, J. L., V. R. Iyer and P. O. Brown (1997): “Exploring the metabolic and genetic control of gene expression on a genomic scale,” Science, 278, 680–686.10.1126/science.278.5338.680Suche in Google Scholar PubMed
Di, Y., D. W. Schafer, J. S. Cumbie and J. H. Chang (2011): “The NBP negative binomial model for assessing differential gene expression from RNA-Seq,” Stat. Appl. Genet. Mol. Biol., 10, 1–28.10.2202/1544-6115.1637Suche in Google Scholar
Eisen, M. B. and P. O. Brown (1999): DNA arrays for analysis of gene expression. Methods Enzymol., 303, 179–205.10.1016/S0076-6879(99)03014-1Suche in Google Scholar PubMed
Frazee, A. C., B. Langmead and J. T. Leek (2011): “ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets,” BMC Bioinformatics, 12, 449.10.1186/1471-2105-12-449Suche in Google Scholar PubMed PubMed Central
Fu, X., N. Fu, S. Guo, Z. Yan, Y. Xu, H. Hu, C. Menzel, W. Chen, Y. Li, R. Zeng and P. Khaitovich (2009): “Estimating accuracy of RNA-Seq and microarrays with proteomics,” BMC Genomics, 10, 161.10.1186/1471-2164-10-161Suche in Google Scholar PubMed PubMed Central
Kal, A. J., A. J. van Zonneveld, V. Benes, M. van den Berg, M. G. Koerkamp, K. Albermann, N. Strack, J. M. Ruijter, A. Richter, B. Dujon, W. Ansorge and H. F. Tabak (1999): “Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources,” Mol. Biol. Cell, 10, 1859–1872.10.1091/mbc.10.6.1859Suche in Google Scholar PubMed PubMed Central
Li, J. and R. Tibshirani (2013): “Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data,” Stat. Methods Med. Res., 22, 519–536.10.1177/0962280211428386Suche in Google Scholar PubMed PubMed Central
Li, J., D. M. Witten, I. M. Johnstone and R. Tibshirani (2012): “Normalization, testing, and false discovery rate estimation for RNA-sequencing data,” Biostatistics, 13, 523–538.10.1093/biostatistics/kxr031Suche in Google Scholar PubMed PubMed Central
Love, M. I., W. Huber and S. Anders (2014): “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biol., 15, 1–21.10.1186/s13059-014-0550-8Suche in Google Scholar PubMed PubMed Central
Madden, S. L., E. A. Galella, J. Zhu, A. H. Bertelsen and G. A. Beaudry (1997): “SAGE transcript profiles for p53-dependent growth regulation,” Oncogene, 15, 1079–1085.10.1038/sj.onc.1201091Suche in Google Scholar PubMed
Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley and J. M. Rothberg (2005): “Genome sequencing in microfabricated high-density picolitre reactors,” Nature, 437, 376–380.10.1038/nature03959Suche in Google Scholar PubMed PubMed Central
Marioni, J. C., C. E. Mason, S. M. Mane, M. Stephens and Y. Gilad (2008): “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome Res., 18, 1509–1517.10.1101/gr.079558.108Suche in Google Scholar PubMed PubMed Central
Miller, N. A., S. F. Kingsmore, A. Farmer, R. J. Langley, J. Mudge, J. A. Crow, A. J. Gonzalez, F. D. Schilkey, R. J. Kim, J. van Velkinburgh, G. D. May, C. F. Black, M. K. Myers, J. P. Utsey, N. S. Frost, D. J. Sugarbaker, R. Bueno, S. R. Gullans, S. M. Baxter, S. W. Day and E. F. Retzel (2008): “Management of high-throughput DNA sequencing projects: Alpheus,” J. Comput. Sci. Syst. Biol., 1, 132.Suche in Google Scholar PubMed
Mortazavi, A., B. A. Williams, K. McCue, L. Schaeffer and B. Wold (2008): “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nat. Methods, 5, 621–628.10.1038/nmeth.1226Suche in Google Scholar PubMed
Nagalakshmi, U., Z. Wang, K. Waern, C. Shou, D. Raha, M. Gerstein and M. Snyder (2008): “The transcriptional landscape of the yeast genome defined by RNA sequencing,” Science, 320, 1344–1349.10.1126/science.1158441Suche in Google Scholar PubMed PubMed Central
Orr, M., P. Liu and D. Nettleton (2014): “An improved method for computing q-values when the distribution of effect sizes is asymmetric,” Bioinformatics, 30, 3044–3053.10.1093/bioinformatics/btu432Suche in Google Scholar PubMed PubMed Central
Robinson, M. D., D. J. McCarthy and G. K. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Suche in Google Scholar PubMed PubMed Central
Spellman, P. T., G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein and B. Futcher (1998): “Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization,” Mol. Biol. Cell, 9, 3273–3297.10.1091/mbc.9.12.3273Suche in Google Scholar PubMed PubMed Central
Storey, J. D. (2002): “A direct approach to false discovery rates,” J. R. Stat. Soc. Ser. B (Stat. Methodol.), 64, 479–498.10.1111/1467-9868.00346Suche in Google Scholar
Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci., 100, 9440–9445.10.1073/pnas.1530509100Suche in Google Scholar PubMed PubMed Central
Tusher, V. G., R. Tibshirani and G. Chu (2001): “Significance analysis of microarrays applied to the ionizing radiation response,” Proc. Natl. Acad. Sci. USA, 98, 5116–5121.10.1073/pnas.091062498Suche in Google Scholar PubMed PubMed Central
Wang, Z., M. Gerstein and M. Snyder (2009): “RNA-Seq: a revolutionary tool for transcriptomics,” Nat. Rev. Genet., 10, 57–63.10.1038/nrg2484Suche in Google Scholar PubMed PubMed Central
Wilhelm, B. T. and J. R. Landry (2009): “RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing,” Methods, 48, 249–257.10.1016/j.ymeth.2009.03.016Suche in Google Scholar PubMed
©2017 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Approximate maximum likelihood estimation for population genetic inference
- A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type
- Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes
- A statistical method for analysing cospeciation in tritrophic ecology using electrical circuit theory
- Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation
- Bayesian estimation of differential transcript usage from RNA-seq data
- A Bayesian hierarchical model for identifying significant polygenic effects while controlling for confounding and repeated measures
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Approximate maximum likelihood estimation for population genetic inference
- A smoothed EM-algorithm for DNA methylation profiles from sequencing-based methods in cell lines or for a single cell type
- Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes
- A statistical method for analysing cospeciation in tritrophic ecology using electrical circuit theory
- Multivariate association between single-nucleotide polymorphisms in Alzgene linkage regions and structural changes in the brain: discovery, refinement and validation
- Bayesian estimation of differential transcript usage from RNA-seq data
- A Bayesian hierarchical model for identifying significant polygenic effects while controlling for confounding and repeated measures