Comparative Analysis of Kmeans Technique on Non Convex Cluster


Authors : Gummadi Venkata Nikhil Sai; Robby Aulia Tubagus; Vasala Rohith; Haritha Donavalli

Volume/Issue : Volume 9 - 2024, Issue 10 - October


Google Scholar : https://tinyurl.com/y4w3d47n

Scribd : https://tinyurl.com/59sm6ahk

DOI : https://doi.org/10.38124/ijisrt/IJISRT24OCT1507

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Clustering algorithms play a critical role in data analysis by grouping similar data points to reveal hidden pat- terns and structures. This study investigates the performance of several clustering algorithms using two distinct datasets: moons and circles. The primary focus is on evaluating and comparing the execution times of these algorithms to determine their efficiency and effectiveness in handling different types of data distributions. Through a series of experiments and performance measurements, this paper aims to provide a detailed analysis of each algorithm’s computational efficiency and suitability for various clustering tasks. The findings are expected to offer practical insights into the selection and application of clustering methods, contributing to enhanced data analysis techniques and informed decision-making in diverse fields.

Keywords : Non-Convex Optimization, K-Means, Machine Learning, Comparative Analysis.

References :

  1. Guha, S., Mishra, N., Roy, G., & Schrijvers, O. (2016). Robust ran- dom cut forest based anomaly detection on streams. Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, 48, 2712-2721. Available at: https://proceedings.mlr.press/v48/guha16.html.
  2. Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering: Algorithms and applications. CRC Press. (pp. 3, 5). Available at: https://people. cs.vt.edu/~reddy/papers/DCBOOK.pdf.
  3. Xu, R., & Wunsch, D. C. (2009). Clustering. Wiley-IEEE Press. (p. 2). ISBN: 9780470276808. DOI: 10.1002/9780470382776.
  4. Har-Peled, S. (2012). Geometric approximation algorithms. American Mathematical Society. (p. 10). Available at: https://graphics.stanford. edu/courses/cs468-06-fall/Papers/01%20har-peled%20notes.pdf.
  5. Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning (pp. 727-734).
  6. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281-297).
  7. Scikit-learn developers. (2024). sklearn.datasets.make_moons. In Scikit-learn documentation. Available at https://scikit-learn.org/ stable/modules/generated/sklearn.datasets.make_moons.html.
  8. Scikit-learn developers. (2024). sklearn.datasets.make_circles. In Scikit-learn documentation. Available at https://scikit-learn.org/ stable/modules/generated/sklearn.datasets.make_circles.html.
  9. Selim SZ, Ismail MA. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell. 1984 Jan;6(1):81-7. doi: 10.1109/tpami.1984.4767478. PMID: 21869168.
  10. Rui Xu and D. Wunsch, "Survey of clustering algorithms," in IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005, doi: 10.1109/TNN.2005.845141.
  11. Bhargav, Sushant & Pawar, Mahesh. (2016). A Review of Clustering Methods forming Non-Convex clusters with, Missing and Noisy Data. INTERNATIONAL JOURNAL OF COMPUTER SCIENCES AND ENGI- NEERING. 3. 39-44.
  12. S. Lloyd, "Least squares quantization in PCM," in IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129-137, March 1982, doi: 10.1109/TIT.1982.1056489.
  13. Arthur, David & Vassilvitskii, Sergei. (2007). K-Means++: The Advan- tages of Careful Seeding. Proc. of the Annu. ACM-SIAM Symp. on Discrete Algorithms. 8. 1027-1035. 10.1145/1283383.1283494.
  14. K. Krishna and M. Narasimha Murty, "Genetic K-means algo- rithm," in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 29, no. 3, pp. 433-439, June 1999, doi: 10.1109/3477.764879.
  15. Ingber, L. (1993). Simulated annealing: Practice versus theory. Math- ematical and Computer Modelling, 18(11), 29-57. DOI: 10.1016/0895- 7177(93)90204-C.
  16. Steinbach, Michael & Karypis, George & Kumar, Vipin. (2000). A Comparison of Document Clustering Techniques. Proceedings of the International KDD Workshop on Text Mining.
  17. Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001). "Constrained k-means clustering with background knowledge." Proceedings of the Eighteenth International Conference on Machine Learning (ICML).
  18. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). "On spectral clustering: Analysis and an algorithm." Advances in Neural Information Process- ing Systems (NIPS).
  19. Dhillon, I. S., Guan, Y.,& Kulis, B. (2004). "Kernel k-means: Spectral clustering and normalized cuts." Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  20. Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10(2–3), 191-203. ISSN 0098-3004. DOI: 10.1016/0098-3004(84)90020-7.
  21. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22. ISSN 0035-9246. DOI: 10.1111/j.2517-6161.1977.tb01600.x.
  22. Sculley, D. (2010). "Web-scale k-means clustering." Proceedings of the 19th International Conference on World Wide Web.1177-1178. DOI: 10.1145/1772690.1772862

Clustering algorithms play a critical role in data analysis by grouping similar data points to reveal hidden pat- terns and structures. This study investigates the performance of several clustering algorithms using two distinct datasets: moons and circles. The primary focus is on evaluating and comparing the execution times of these algorithms to determine their efficiency and effectiveness in handling different types of data distributions. Through a series of experiments and performance measurements, this paper aims to provide a detailed analysis of each algorithm’s computational efficiency and suitability for various clustering tasks. The findings are expected to offer practical insights into the selection and application of clustering methods, contributing to enhanced data analysis techniques and informed decision-making in diverse fields.

Keywords : Non-Convex Optimization, K-Means, Machine Learning, Comparative Analysis.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe