Authors :
Gummadi Venkata Nikhil Sai; Robby Aulia Tubagus; Vasala Rohith; Haritha Donavalli
Volume/Issue :
Volume 9 - 2024, Issue 10 - October
Google Scholar :
https://tinyurl.com/y4w3d47n
Scribd :
https://tinyurl.com/59sm6ahk
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24OCT1507
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Clustering algorithms play a critical role in data
analysis by grouping similar data points to reveal hidden
pat- terns and structures. This study investigates the
performance of several clustering algorithms using two
distinct datasets: moons and circles. The primary focus is
on evaluating and comparing the execution times of these
algorithms to determine their efficiency and effectiveness
in handling different types of data distributions. Through
a series of experiments and performance measurements,
this paper aims to provide a detailed analysis of each
algorithm’s computational efficiency and suitability for
various clustering tasks. The findings are expected to offer
practical insights into the selection and application of
clustering methods, contributing to enhanced data
analysis techniques and informed decision-making in
diverse fields.
Keywords :
Non-Convex Optimization, K-Means, Machine Learning, Comparative Analysis.
References :
- Guha, S., Mishra, N., Roy, G., & Schrijvers, O. (2016). Robust ran- dom cut forest based anomaly detection on streams. Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, 48, 2712-2721. Available at: https://proceedings.mlr.press/v48/guha16.html.
- Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering: Algorithms and applications. CRC Press. (pp. 3, 5). Available at: https://people. cs.vt.edu/~reddy/papers/DCBOOK.pdf.
- Xu, R., & Wunsch, D. C. (2009). Clustering. Wiley-IEEE Press. (p. 2). ISBN: 9780470276808. DOI: 10.1002/9780470382776.
- Har-Peled, S. (2012). Geometric approximation algorithms. American Mathematical Society. (p. 10). Available at: https://graphics.stanford. edu/courses/cs468-06-fall/Papers/01%20har-peled%20notes.pdf.
- Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning (pp. 727-734).
- MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281-297).
- Scikit-learn developers. (2024). sklearn.datasets.make_moons. In Scikit-learn documentation. Available at https://scikit-learn.org/ stable/modules/generated/sklearn.datasets.make_moons.html.
- Scikit-learn developers. (2024). sklearn.datasets.make_circles. In Scikit-learn documentation. Available at https://scikit-learn.org/ stable/modules/generated/sklearn.datasets.make_circles.html.
- Selim SZ, Ismail MA. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell. 1984 Jan;6(1):81-7. doi: 10.1109/tpami.1984.4767478. PMID: 21869168.
- Rui Xu and D. Wunsch, "Survey of clustering algorithms," in IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005, doi: 10.1109/TNN.2005.845141.
- Bhargav, Sushant & Pawar, Mahesh. (2016). A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy Data. INTERNATIONAL JOURNAL OF COMPUTER SCIENCES AND ENGI- NEERING. 3. 39-44.
- S. Lloyd, "Least squares quantization in PCM," in IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, March 1982, doi: 10.1109/TIT.1982.1056489.
- Arthur, David & Vassilvitskii, Sergei. (2007). K-Means++: The Advan- tages of Careful Seeding. Proc. of the Annu. ACM-SIAM Symp. on Discrete Algorithms. 8. 1027-1035. 10.1145/1283383.1283494.
- K. Krishna and M. Narasimha Murty, "Genetic K-means algo- rithm," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 29, no. 3, pp. 433-439, June 1999, doi: 10.1109/3477.764879.
- Ingber, L. (1993). Simulated annealing: Practice versus theory. Math- ematical and Computer Modelling, 18(11), 29-57. DOI: 10.1016/0895- 7177(93)90204-C.
- Steinbach, Michael & Karypis, George & Kumar, Vipin. (2000). A Comparison of Document Clustering Techniques. Proceedings of the International KDD Workshop on Text Mining.
- Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001). "Constrained k-means clustering with background knowledge." Proceedings of the Eighteenth International Conference on Machine Learning (ICML).
- Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). "On spectral clustering: Analysis and an algorithm." Advances in Neural Information Process- ing Systems (NIPS).
- Dhillon, I. S., Guan, Y.,& Kulis, B. (2004). "Kernel k-means: Spectral clustering and normalized cuts." Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191-203. ISSN 0098-3004. DOI: 10.1016/0098-3004(84)90020-7.
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. ISSN 0035-9246. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
- Sculley, D. (2010). "Web-scale k-means clustering." Proceedings of the 19th International Conference on World Wide Web.1177-1178. DOI: 10.1145/1772690.1772862
Clustering algorithms play a critical role in data
analysis by grouping similar data points to reveal hidden
pat- terns and structures. This study investigates the
performance of several clustering algorithms using two
distinct datasets: moons and circles. The primary focus is
on evaluating and comparing the execution times of these
algorithms to determine their efficiency and effectiveness
in handling different types of data distributions. Through
a series of experiments and performance measurements,
this paper aims to provide a detailed analysis of each
algorithm’s computational efficiency and suitability for
various clustering tasks. The findings are expected to offer
practical insights into the selection and application of
clustering methods, contributing to enhanced data
analysis techniques and informed decision-making in
diverse fields.
Keywords :
Non-Convex Optimization, K-Means, Machine Learning, Comparative Analysis.