Abstract
It is often of research interest to identify genes that satisfy a particular expression pattern across different conditions such as tissues, genotypes, etc. One common practice is to perform differential expression analysis for each condition separately and then take the intersection of differentially expressed (DE) genes or non-DE genes under each condition to obtain genes that satisfy a particular pattern. Such a method can lead to many false positives, especially when the desired gene expression pattern involves equivalent expression under one condition. In this paper, we apply a Bayesian partition model to identify genes of all desired patterns while simultaneously controlling their false discovery rates (FDRs). Our simulation studies show that the common practice fails to control group specific FDRs for patterns involving equivalent expression while the proposed Bayesian method simultaneously controls group specific FDRs at all settings studied. In addition, the proposed method is more powerful when the FDR of the common practice is under control for identifying patterns only involving DE genes. Our simulation studies also show that it is an inherently more challenging problem to identify patterns involving equivalent expression than patterns only involving differential expression. Therefore, larger sample sizes are required to obtain the same target power to identify the former types of patterns than the latter types of patterns.
-
Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This work was partially done with the use of the BIOMIX compute cluster at University of Delaware, which was made possible through funding from Delaware INBRE (NIGMS P20GM103446), the State of Delaware and the Delaware Biotechnology Institute.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
Abramowitz, M. and Stegun, I.A. (1964). Hypergeometric functions. In: Handbook of mathematical functions: with formulas, graphs, and mathematical tables, Vol. 55, chap. 15. Courier Corporation, Dover, New York, pp. 555–566.Suche in Google Scholar
Bian, Y., He, C., Hou, J., Cheng, J., and Qiu, J. (2019). PairedFB: a full hierarchical Bayesian model for paired RNA-seq data with heterogeneous treatment effects. Bioinformatics 35: 787–797. https://doi.org/10.1093/bioinformatics/bty731.Suche in Google Scholar PubMed PubMed Central
Choi, J., Tanaka, K., Cao, Y., Qi, Y., Qiu, J., Liang, Y., Lee, S.Y., and Stacey, G. (2014). Identification of a plant receptor for extracellular ATP. Science 343: 290–294. https://doi.org/10.1126/science.343.6168.290.Suche in Google Scholar PubMed
Chung, L.M., Ferguson, J.P., Zheng, W., Qian, F., Bruno, V., Montgomery, R.R., and Zhao, H. (2013). Differential expression analysis for paired RNA-seq data. BMC Bioinf. 14: 110. https://doi.org/10.1186/1471-2105-14-110.Suche in Google Scholar PubMed PubMed Central
Cui, S., Ji, T., Li, J., Cheng, J., and Qiu, J. (2016). What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment. Stat. Appl. Genet. Mol. Biol. 15: 87–105. https://doi.org/10.1515/sagmb-2015-0011.Suche in Google Scholar PubMed
Eddelbuettel, D. and François, R. (2011). Rcpp: seamless R and C++++ integration. J. Stat. Software 40: 1–18. https://doi.org/10.18637/jss.v040.i08.Suche in Google Scholar
Gough, B. (2009). GNU scientific library reference manual, 3rd ed. Godalming, Surrey, England: Network Theory Ltd.Suche in Google Scholar
Guo, W., Sarkar, S.K., and Peddada, S.D. (2010). Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66: 485–492. https://doi.org/10.1111/j.1541-0420.2009.01292.x.Suche in Google Scholar PubMed PubMed Central
Hardcastle, T.J. and Kelly, K.A. (2013). Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution. BMC Bioinf. 14: 135. https://doi.org/10.1186/1471-2105-14-135.Suche in Google Scholar PubMed PubMed Central
Johnson, V.E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. Roy. Stat. Soc. B 72: 143–170. https://doi.org/10.1111/j.1467-9868.2009.00730.x.Suche in Google Scholar
Love, M.I., Huber, W., and Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15: 1. https://doi.org/10.1186/s13059-014-0550-8.Suche in Google Scholar PubMed PubMed Central
McCarthy, D.J., Chen, Y., and Smyth, G.K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40: 4288–4297, https://doi.org/10.1093/nar/gks042.Suche in Google Scholar PubMed PubMed Central
Müller, P., Parmigiani, G., Robert, C., and Rousseau, J. (2004). Optimal sample size for multiple testing: the case of gene expression microarrays. J. Am. Stat. Assoc. 99: 990–1001. https://doi.org/10.1198/016214504000001646.Suche in Google Scholar
Newton, M.A., Noueiry, A., Sarkar, D., and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5: 155–176. https://doi.org/10.1093/biostatistics/5.2.155.Suche in Google Scholar PubMed
Qiu, J. and Cui, X. (2010). Evaluation of a statistical equivalence test applied to microarray data. J. Biopharm. Stat. 20: 240–266. https://doi.org/10.1080/10543400903572738.Suche in Google Scholar PubMed
Robinson, M.D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11: 1. https://doi.org/10.1186/gb-2010-11-3-r25.Suche in Google Scholar PubMed PubMed Central
Scott, J.G. and Berger, J.O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Stat. 38: 2587–2619, https://doi.org/10.1214/10-aos792.Suche in Google Scholar
Tuke, J., Glonek, G., and Solomon, P. (2008). Gene profiling for determining pluripotent genes in a time course microarray experiment. Biostatistics 10: 80–93. https://doi.org/10.1093/biostatistics/kxn017.Suche in Google Scholar PubMed
Valdés-López, O., Khan, S.M., Schmitz, R.J., Cui, S., Qiu, J., Joshi, T., Xu, D., Diers, B., Ecker, J.R., and Stacey, G. (2014). Genotypic variation of gene expression during the soybean innate immunity response. Plant Genet. Resour. 12: S27–S30. https://doi.org/10.1017/s1479262114000197.Suche in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/sagmb-2022-0025).
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Review
- Mediation analysis method review of high throughput data
- Research Articles
- When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself?
- Patterns of differential expression by association in omic data using a new measure based on ensemble learning
- Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization
- Randomized singular value decomposition for integrative subtype analysis of ‘omics data’ using non-negative matrix factorization
- A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction
- Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method
- Improving the accuracy and internal consistency of regression-based clustering of high-dimensional datasets
- A Bayesian model to identify multiple expression patterns with simultaneous FDR control for a multi-factor RNA-seq experiment
- A fast and efficient approach for gene-based association studies of ordinal phenotypes
- Software and Application Note
- CAT PETR: a graphical user interface for differential analysis of phosphorylation and expression data
Artikel in diesem Heft
- Review
- Mediation analysis method review of high throughput data
- Research Articles
- When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself?
- Patterns of differential expression by association in omic data using a new measure based on ensemble learning
- Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization
- Randomized singular value decomposition for integrative subtype analysis of ‘omics data’ using non-negative matrix factorization
- A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction
- Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method
- Improving the accuracy and internal consistency of regression-based clustering of high-dimensional datasets
- A Bayesian model to identify multiple expression patterns with simultaneous FDR control for a multi-factor RNA-seq experiment
- A fast and efficient approach for gene-based association studies of ordinal phenotypes
- Software and Application Note
- CAT PETR: a graphical user interface for differential analysis of phosphorylation and expression data