Startseite Mathematik Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Detecting differentially expressed genes from RNA-seq data using fuzzy clustering

  • Yuki Ando EMAIL logo und Asanao Shimokawa
Veröffentlicht/Copyright: 29. Juli 2024

Abstract

A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.


Corresponding author: Yuki Ando, Tokyo University of Science, Shinjuku-ku, 162-8601, Tokyo, Japan, E-mail: 

  1. Research ethics: Not applicable.

  2. Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Competing interests: The authors state no competing interests.

  4. Research funding: None declared.

  5. Data availability: The raw data can be obtained on request from the corresponding author.

References

1. Gunaratne, J, Schmidt, A, Quandt, A, Neo, SP, Saraç, ÖS, Gracia, T, et al.. Extensive mass spectrometry-based analisis of the fission yeast proteome. Mol Cell Proteomics 2013;12:1741–51. https://doi.org/10.1074/mcp.m112.023754.Suche in Google Scholar

2. Soneson, C, Delorenzi, M. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. BMC Bioinf 2013;14. https://doi.org/10.1186/1471-2105-14-91.Suche in Google Scholar PubMed PubMed Central

3. Dudoit, S, Yang, YH, Callow, MJ, Speed, TP. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 2002;12:111–39.Suche in Google Scholar

4. Draghici, S. Statistics and data analysis for microarrays using R and bioconductor. New York: CRC Press; 2012.Suche in Google Scholar

5. Rajkumar, AP, Qvist, P, Lazarus, R, Lescai, F, Ju, J, Nyegaard, M, et al.. Experimental validation of methods for differential gene expression analysis and sample pooling in RNA-seq. BMC Genom 2015;16. https://doi.org/10.1186/s12864-015-1767-y.Suche in Google Scholar PubMed PubMed Central

6. Kadota, K, Nakai, Y, Shimizu, K. A weighted average difference method for detecting differentially expressed genes from microarray data. Algorithm Mol Biol 2008;3. https://doi.org/10.1186/1748-7188-3-8.Suche in Google Scholar PubMed PubMed Central

7. Breitling, R, Armengaud, P, Amtmann, A, Herzyk, P. Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004;573:83–92. https://doi.org/10.1016/j.febslet.2004.07.055.Suche in Google Scholar PubMed

8. Robinson, MD, McCarthy, DJ, Smyth, GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.Suche in Google Scholar PubMed PubMed Central

9. Anders, S, Huber, W. Differential expression analysis for sequence count data. Genome Biol 2010;11. https://doi.org/10.1038/npre.2010.4282.1.Suche in Google Scholar

10. Li, J, Tibshirani, R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 2013;22. https://doi.org/10.1177/0962280211428386.Suche in Google Scholar PubMed PubMed Central

11. Amaratunga, D, Cabrera, J, Shkedy, Z. Exploration and analysis of DNA microarray and other high-dimensional data. New Jersey: Wiley; 2014.10.1002/9781118364505Suche in Google Scholar

12. Horvath, S, Dong, J. Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol 2008;4. https://doi.org/10.1371/journal.pcbi.1000117.Suche in Google Scholar PubMed PubMed Central

13. Love, MI, Huber, W, Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15. https://doi.org/10.1186/s13059-014-0550-8.Suche in Google Scholar PubMed PubMed Central

14. Bezdek, JC, Ehrlich, R, Full, W. FCM:The fuzzy c-meansclustering algorithm. Comput Geosci 1984;10:191–203. https://doi.org/10.1016/0098-3004(84)90020-7.Suche in Google Scholar

15. Verhoeven, KJF, Simonsen, KL, McIntyre, LM. Implementing false discovery rate control: increasing your power. Oikos 2005;108:643–7. https://doi.org/10.1111/j.0030-1299.2005.13727.x.Suche in Google Scholar

16. Benjamini, Y, Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Methodol 1995;57:289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.Suche in Google Scholar

17. Sun, J, Nishiyama, T, Shimizu, K, Kadota, K. TCC: an R package for comparing tag count data with robust normalization strategies. Bioinformatics 2013;14. https://doi.org/10.1186/1471-2105-14-219.Suche in Google Scholar PubMed PubMed Central

18. Sultan, M, Schulz, MH, Richard, H, Magen, A, Klingenhoff, A, Scherf, M, et al.. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 2008;321:956–60. https://doi.org/10.1126/science.1160342.Suche in Google Scholar PubMed

Received: 2023-10-05
Accepted: 2024-06-02
Published Online: 2024-07-29

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Artikel in diesem Heft

  1. Frontmatter
  2. Research Articles
  3. Random forests for survival data: which methods work best and under what conditions?
  4. Flexible variable selection in the presence of missing data
  5. An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
  6. MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
  7. Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
  8. Hypothesis testing for detecting outlier evaluators
  9. Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
  10. Commentary
  11. Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
  12. Research Articles
  13. Optimizing personalized treatments for targeted patient populations across multiple domains
  14. Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
  15. History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
  16. Revisiting incidence rates comparison under right censorship
  17. Ensemble learning methods of inference for spatially stratified infectious disease systems
  18. The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
  19. Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
  20. Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
  21. Improving the mixed model for repeated measures to robustly increase precision in randomized trials
  22. Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
  23. A modified rule of three for the one-sided binomial confidence interval
  24. Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
  25. Bayesian estimation and prediction for network meta-analysis with contrast-based approach
  26. Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Heruntergeladen am 1.1.2026 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2023-0125/pdf
Button zum nach oben scrollen