3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update

Zohreh Safari

Chapter

3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update

Published by

Become an author with De Gruyter Brill

Explore this Subject How to publish with us

This chapter is in the book Big Data, Data Mining and Data Science

Abstract

Automatically defining the optimal number of clusters is a pivotal challenge in clustering algorithms. Striking a balance between clustering quality and algorithm efficiency in this determination process is a crucial tradeoff that motivated our research. In our approach, we have successfully automated the identification of the optimal number of clusters, particularly tailored for large high-dimensional datasets. Our method addresses both the quality and efficiency aspects of clustering. Through conducting experimental studies on five previously explored datasets [23] and introducing four new, larger datasets, which have been done in this study, I have observed that our procedure provides flexibility in selecting diverse criteria for determining the optimal K under each circumstance. Leveraging the advantages of the bisecting K-means algorithm, our approach outperforms the Ray and Turi method, showcasing higher efficiency in identifying the best number of clusters.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

https://doi.org/10.1515/9783111344553-003

3 Rapid and automated determination of cluster numbers for high-dimensional big data: a comprehensive update

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book