Investigating the impact of sample size on the performance of the knn algorithm| International Journal of Innovative Science and Research Technology

Investigating the Impact of Sample Size on the Performance of the k-NN Algorithm

Authors : Zara Wong

Volume/Issue : Volume 9 - 2024, Issue 5 - May

Google Scholar : https://tinyurl.com/4uurzrsv

Scribd : https://tinyurl.com/mrs9hewe

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAY517

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : The k-Nearest Neighbour (k-NN) algorithm is a simple and intuitive classification algorithm used for pattern recognition and classification tasks. This research paper aims to address a gap in literature by exploring the relationship between sample size and the performance of the k-Nearest Neighbour (k-NN) algorithm. Through intensive experimental analysis of secondary data, we investigate how varying sample sizes influence the algorithm’s classification accuracy, computational efficiency, and generalization capabilities. Our findings reveal that an ideal scope for sample sizes is >190, with minimal differing results beyond that point. The maximum of the graph is 340, suggesting it to be the optimal value for ideal accuracy for this training model and scope. These results contribute to a deeper understanding of the proper application of the k-NN. These findings contribute to a deeper understanding of the complex interplay between sample sizes and k NN algorithm performance, aiding practitioners in making informed decisions when employing this method in realworld applications, and suggest the ideal value for sample size.

Keywords : Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.

References :

Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbor classifier performance for heterogeneous data sets. SN Applied Sciences, 1(12), 1559. https://doi.org/10.1007/s42452-019-1356-9
Trivedi, U. B., Bhatt, M., & Srivastava, P. (2021). Prevent Overfitting Problem in Machine Learning: A Case Focus on Linear Regression and Logistics Regression. In P. K. Singh, Z. Polkowski, S. Tanwar, S. K. Pandey, G. Matei, & D. Pirvu (Eds.), Innovations in Information and Communication Technologies (IICT-2020) (pp. 345–349). Springer International Publishing. https://doi.org/10.1007/978-3-030-66218-9_40
Maia Polo, F., & Vicente, R. (2023). Effective sample size, dimensionality, and generalization in covariate shift adaptation. Neural Computing and Applications, 35(25), 18187–18199. https://doi.org/10.1007/s 00521-021-06615-1
Helm, J. M., Swiergosz, A. M., Haeberle, H. S., Karnuta, J. M., Schaffer, J. L., Krebs, V. E., Spitzer, A. I., & Ramkumar, P. N. (2020). Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Current Reviews in Musculoskeletal Medicine, 13(1), 69–76. https://doi.org/10.1007/s12178-020-09600-8
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s 12525-021-00475-2
Mithy, S., Hossain, S., Akter, S., . U. H., & Sogir, S. (2022). Classification of Iris Flower Dataset using Different Algorithms. 9, 1–10. https://doi.org/ 10.26438/ijsrmss/v9i6.110
Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), Article 1. https://doi.org/10.1038/s41598-022-10358-x

The k-Nearest Neighbour (k-NN) algorithm is a simple and intuitive classification algorithm used for pattern recognition and classification tasks. This research paper aims to address a gap in literature by exploring the relationship between sample size and the performance of the k-Nearest Neighbour (k-NN) algorithm. Through intensive experimental analysis of secondary data, we investigate how varying sample sizes influence the algorithm’s classification accuracy, computational efficiency, and generalization capabilities. Our findings reveal that an ideal scope for sample sizes is >190, with minimal differing results beyond that point. The maximum of the graph is 340, suggesting it to be the optimal value for ideal accuracy for this training model and scope. These results contribute to a deeper understanding of the proper application of the k-NN. These findings contribute to a deeper understanding of the complex interplay between sample sizes and k NN algorithm performance, aiding practitioners in making informed decisions when employing this method in realworld applications, and suggest the ideal value for sample size.

Keywords : Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.