Detecting differentially expressed genes from RNA-seq data using fuzzy clustering

Yuki Ando; Asanao Shimokawa

doi:10.1515/ijb-2023-0125

Artikel

Detecting differentially expressed genes from RNA-seq data using fuzzy clustering

Yuki Ando und Asanao Shimokawa

Veröffentlicht/Copyright: 29. Juli 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift The International Journal of Biostatistics Band 20 Heft 2

Abstract

A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.

Keywords: two group comparison; DEGs; expression level; fold-change

Corresponding author: Yuki Ando, Tokyo University of Science, Shinjuku-ku, 162-8601, Tokyo, Japan, E-mail: 1422701@ed.tus.ac.jp

Research ethics: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: The authors state no competing interests.
Research funding: None declared.
Data availability: The raw data can be obtained on request from the corresponding author.

References

1. Gunaratne, J, Schmidt, A, Quandt, A, Neo, SP, Saraç, ÖS, Gracia, T, et al.. Extensive mass spectrometry-based analisis of the fission yeast proteome. Mol Cell Proteomics 2013;12:1741–51. https://doi.org/10.1074/mcp.m112.023754.Suche in Google Scholar

2. Soneson, C, Delorenzi, M. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. BMC Bioinf 2013;14. https://doi.org/10.1186/1471-2105-14-91.Suche in Google Scholar PubMed PubMed Central

3. Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 2002;12:111–39.Suche in Google Scholar

4. Draghici, S. Statistics and data analysis for microarrays using R and bioconductor. New York: CRC Press; 2012.Suche in Google Scholar

5. Rajkumar, AP, Qvist, P, Lazarus, R, Lescai, F, Ju, J, Nyegaard, M, et al.. Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq. BMC Genom 2015;16. https://doi.org/10.1186/s12864-015-1767-y.Suche in Google Scholar PubMed PubMed Central

6. Kadota, K, Nakai, Y, Shimizu, K. A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithm Mol Biol 2008;3. https://doi.org/10.1186/1748-7188-3-8.Suche in Google Scholar PubMed PubMed Central

7. Breitling, R, Armengaud, P, Amtmann, A, Herzyk, P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004;573:83–92. https://doi.org/10.1016/j.febslet.2004.07.055.Suche in Google Scholar PubMed

8. Robinson, MD, McCarthy, DJ, Smyth, GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.Suche in Google Scholar PubMed PubMed Central

9. Anders, S, Huber, W. Differential expression analysis for sequence count data. Genome Biol 2010;11. https://doi.org/10.1038/npre.2010.4282.1.Suche in Google Scholar

10. Li, J, Tibshirani, R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 2013;22. https://doi.org/10.1177/0962280211428386.Suche in Google Scholar PubMed PubMed Central

11. Amaratunga, D, Cabrera, J, Shkedy, Z. Exploration and analysis of DNA microarray and other high-dimensional data. New Jersey: Wiley; 2014.10.1002/9781118364505Suche in Google Scholar

12. Horvath, S, Dong, J. Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol 2008;4. https://doi.org/10.1371/journal.pcbi.1000117.Suche in Google Scholar PubMed PubMed Central

13. Love, MI, Huber, W, Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15. https://doi.org/10.1186/s13059-014-0550-8.Suche in Google Scholar PubMed PubMed Central

14. Bezdek, JC, Ehrlich, R, Full, W. FCM:The fuzzy c-meansclustering algorithm. Comput Geosci 1984;10:191–203. https://doi.org/10.1016/0098-3004(84)90020-7.Suche in Google Scholar

15. Verhoeven, KJF, Simonsen, KL, McIntyre, LM. Implementing false discovery rate control: increasing your power. Oikos 2005;108:643–7. https://doi.org/10.1111/j.0030-1299.2005.13727.x.Suche in Google Scholar

16. Benjamini, Y, Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Methodol 1995;57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.Suche in Google Scholar

17. Sun, J, Nishiyama, T, Shimizu, K, Kadota, K. TCC: an R package for comparing tag count data with robust normalization strategies. Bioinformatics 2013;14. https://doi.org/10.1186/1471-2105-14-219.Suche in Google Scholar PubMed PubMed Central

18. Sultan, M, Schulz, MH, Richard, H, Magen, A, Klingenhoff, A, Scherf, M, et al.. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 2008;321:956–60. https://doi.org/10.1126/science.1160342.Suche in Google Scholar PubMed

Received: 2023-10-05

Accepted: 2024-06-02

Published Online: 2024-07-29

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/ijb-2023-0125

Schlagwörter für diesen Artikel

two group comparison; DEGs; expression level; fold-change