Home Mathematics 3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update
Chapter
Licensed
Unlicensed Requires Authentication

3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update

Become an author with De Gruyter Brill
Big Data, Data Mining and Data Science
This chapter is in the book Big Data, Data Mining and Data Science

Abstract

Automatically defining the optimal number of clusters is a pivotal challenge in clustering algorithms. Striking a balance between clustering quality and algorithm efficiency in this determination process is a crucial tradeoff that motivated our research. In our approach, we have successfully automated the identification of the optimal number of clusters, particularly tailored for large high-dimensional datasets. Our method addresses both the quality and efficiency aspects of clustering. Through conducting experimental studies on five previously explored datasets [23] and introducing four new, larger datasets, which have been done in this study, I have observed that our procedure provides flexibility in selecting diverse criteria for determining the optimal K under each circumstance. Leveraging the advantages of the bisecting K-means algorithm, our approach outperforms the Ray and Turi method, showcasing higher efficiency in identifying the best number of clusters.

Abstract

Automatically defining the optimal number of clusters is a pivotal challenge in clustering algorithms. Striking a balance between clustering quality and algorithm efficiency in this determination process is a crucial tradeoff that motivated our research. In our approach, we have successfully automated the identification of the optimal number of clusters, particularly tailored for large high-dimensional datasets. Our method addresses both the quality and efficiency aspects of clustering. Through conducting experimental studies on five previously explored datasets [23] and introducing four new, larger datasets, which have been done in this study, I have observed that our procedure provides flexibility in selecting diverse criteria for determining the optimal K under each circumstance. Leveraging the advantages of the bisecting K-means algorithm, our approach outperforms the Ray and Turi method, showcasing higher efficiency in identifying the best number of clusters.

Chapters in this book

  1. Frontmatter I
  2. Preface V
  3. Contents VII
  4. Methods and instrumentation
  5. 1 Identifying and estimating outliers in time series with nonstationary mean through multiobjective optimization method 1
  6. 2 Using the intentionally linked entities (ILE) database system to create hypergraph databases with fast and reliable relationship linking, with example applications 21
  7. 3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update 37
  8. 4 Canonical correlation analysis and exploratory factor analysis of the four major centrality metrics 49
  9. 5 Navigating the landscape of automated data preprocessing: an in-depth review of automated machine learning platforms 71
  10. 6 Generating random XML 83
  11. Applications and case studies
  12. 7 Exploring autism risk: a deep dive into graph neural networks and gene interaction data 105
  13. 8 Leveraging ChatGPT and table arrangement techniques in advanced newspaper content analysis for stock insights 121
  14. 9 An experimental study on road surface classification 145
  15. 10 RNN models for evaluating financial indices: examining volatility and demand-supply shifts in financial markets during COVID-19 165
  16. 11 Topological methods for vibration feature extraction 185
  17. 12 Dyna-SPECTS: DYNAmic enSemble of Price Elasticity Computation models using Thompson Sampling in e-commerce 215
  18. 13 Creating a metadata schema for reservoirs of data: a systems engineering approach 251
  19. 14 Implementation and evaluation of an eXplainable artificial intelligence to explain the evaluation of an assessment analytics algorithm for freetext exams in psychology courses in higher education to attest QBLM-based competencies 271
  20. 15 Toward a skill-centered qualification ontology supporting data mining of human resources in knowledge-based enterprise process representations 307
  21. Index 333
Downloaded on 26.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/9783111344553-003/html
Scroll to top button