Home Mathematics 9 A hybrid feature gene selection approach by integrating variance filter, extremely randomized tree, and Cuckoo Search algorithm for cancer classification
Chapter
Licensed
Unlicensed Requires Authentication

9 A hybrid feature gene selection approach by integrating variance filter, extremely randomized tree, and Cuckoo Search algorithm for cancer classification

  • Abrar Yaqoob , Navneet Kumar Verma , G. V. V. Jagannadha Rao and Rabia Musheer Aziz
Become an author with De Gruyter Brill
Drug Discovery and Telemedicine
This chapter is in the book Drug Discovery and Telemedicine

Abstract

In biomedical data mining, the challenge of handling high-dimensional gene expression data, where the number of genes often exceeds the number of samples, poses a significant hurdle for accurate classification and analysis. To address this issue, this paper introduces a novel three-stage hybrid gene selection approach that combines a variance filter, an extremely randomized tree, and the Cuckoo Search algorithm. Initially, the variance filter reduces the dimensionality of the gene space by eliminating genes with low variability. Subsequently, the extremely randomized tree method further refines this subset by prioritizing those with strong associations to the target phenotype. Finally, the Cuckoo Search algorithm identifies the optimal feature gene subset from this refined pool. The proposed methodology was evaluated on a breast cancer gene expression dataset using four classifiers: Random Forest, Linear Regression, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). Experimental results showed that the proposed method consistently outperformed the extremely randomized tree and Variance Filter techniques. For instance, with the Random Forest classifier, the proposed method achieved 100 % accuracy with 11 selected genes, compared to 95.96 % and 83.91 % for the extremely randomized tree and Variance Filter methods, respectively. Similar trends were observed with the other classifiers, where the proposed method achieved the highest accuracies, demonstrating its robustness and effectiveness. These findings underscore the potential of the proposed hybrid approach to significantly improve classification accuracy and reliability in biomedical data mining applications, offering a powerful tool for gene selection and analysis in high-dimensional datasets.

Abstract

In biomedical data mining, the challenge of handling high-dimensional gene expression data, where the number of genes often exceeds the number of samples, poses a significant hurdle for accurate classification and analysis. To address this issue, this paper introduces a novel three-stage hybrid gene selection approach that combines a variance filter, an extremely randomized tree, and the Cuckoo Search algorithm. Initially, the variance filter reduces the dimensionality of the gene space by eliminating genes with low variability. Subsequently, the extremely randomized tree method further refines this subset by prioritizing those with strong associations to the target phenotype. Finally, the Cuckoo Search algorithm identifies the optimal feature gene subset from this refined pool. The proposed methodology was evaluated on a breast cancer gene expression dataset using four classifiers: Random Forest, Linear Regression, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM). Experimental results showed that the proposed method consistently outperformed the extremely randomized tree and Variance Filter techniques. For instance, with the Random Forest classifier, the proposed method achieved 100 % accuracy with 11 selected genes, compared to 95.96 % and 83.91 % for the extremely randomized tree and Variance Filter methods, respectively. Similar trends were observed with the other classifiers, where the proposed method achieved the highest accuracies, demonstrating its robustness and effectiveness. These findings underscore the potential of the proposed hybrid approach to significantly improve classification accuracy and reliability in biomedical data mining applications, offering a powerful tool for gene selection and analysis in high-dimensional datasets.

Chapters in this book

  1. Frontmatter I
  2. Contents V
  3. List of Contributing Authors VII
  4. 1 Introduction: fundamentals of drug discovery, telemedicine, artificial intelligence, computer vision, and IoT 1
  5. 2 Machine learning transformations in drug discovery: a paradigm shift in development strategies 11
  6. 3 Explainable AI approaches in drug classification from biomarkers of epileptic seizure 27
  7. 4 Harnessing predictive analytics and machine learning in personalized medicine: patient outcomes and public health strategies 41
  8. 5 A data-driven framework for future healthcare diagnosis through predictive analytics 59
  9. 6 Revolutionizing home healthcare: telemedicine, predictive analytics, and AI-driven drug discovery 71
  10. 7 AI-driven insights: a machine learning approach to lung cancer diagnosis 91
  11. 8 Efficient gene selection for breast cancer classification using Brownian Motion Search Algorithm and Support Vector Machine 109
  12. 9 A hybrid feature gene selection approach by integrating variance filter, extremely randomized tree, and Cuckoo Search algorithm for cancer classification 127
  13. 10 HySleep_Net: a hybrid deep learning model for automatic sleep stage detection from polysomnographic signals 151
  14. 11 Ambulance booking and tracking website 183
  15. 12 Entropy based emergency rescue location selection with uncertain travel time 207
  16. 13 Performance comparison of different deep learning ensemble models for sentiment classification of movie reviews 225
  17. 14 Elevating standards in homoeopathic medicine: chemometric standardization of medicinal plant for quality assurance 253
  18. 15 Evaluation of genetic diversity in Rauvolfia species using Random Amplification of Polymorphic DNA (RAPD) technique 259
  19. Index
Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/9783111504667-009/html
Scroll to top button