Investigating the Impact of Sample Size on the Performance of the k-NN Algorithm


Authors : Zara Wong

Volume/Issue : Volume 9 - 2024, Issue 5 - May

Google Scholar : https://tinyurl.com/4uurzrsv

Scribd : https://tinyurl.com/mrs9hewe

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAY517

Abstract : The k-Nearest Neighbour (k-NN) algorithm is a simple and intuitive classification algorithm used for pattern recognition and classification tasks. This research paper aims to address a gap in literature by exploring the relationship between sample size and the performance of the k-Nearest Neighbour (k-NN) algorithm. Through intensive experimental analysis of secondary data, we investigate how varying sample sizes influence the algorithm’s classification accuracy, computational efficiency, and generalization capabilities. Our findings reveal that an ideal scope for sample sizes is >190, with minimal differing results beyond that point. The maximum of the graph is 340, suggesting it to be the optimal value for ideal accuracy for this training model and scope. These results contribute to a deeper understanding of the proper application of the k-NN. These findings contribute to a deeper understanding of the complex interplay between sample sizes and k NN algorithm performance, aiding practitioners in making informed decisions when employing this method in realworld applications, and suggest the ideal value for sample size.

Keywords : Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.

References :

  1. Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbor classifier performance for heterogeneous data sets. SN Applied Sciences, 1(12), 1559. https://doi.org/10.1007/s42452-019-1356-9
  2. Trivedi, U. B., Bhatt, M., & Srivastava, P. (2021). Prevent Overfitting Problem in Machine Learning: A Case Focus on Linear Regression and Logistics Regression. In P. K. Singh, Z. Polkowski, S. Tanwar, S. K. Pandey, G. Matei, & D. Pirvu (Eds.), Innovations in Information and Communication Technologies  (IICT-2020) (pp. 345–349). Springer International Publishing. https://doi.org/10.1007/978-3-030-66218-9_40
  3. Maia Polo, F., & Vicente, R. (2023). Effective sample size, dimensionality, and generalization in covariate shift adaptation. Neural Computing and Applications, 35(25), 18187–18199. https://doi.org/10.1007/s 00521-021-06615-1
  4. Helm, J. M., Swiergosz, A. M., Haeberle, H. S., Karnuta, J. M., Schaffer, J. L., Krebs, V. E., Spitzer, A. I., & Ramkumar, P. N. (2020). Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Current Reviews in Musculoskeletal Medicine, 13(1), 69–76. https://doi.org/10.1007/s12178-020-09600-8
  5. Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s 12525-021-00475-2
  6. Mithy, S., Hossain, S., Akter, S., . U. H., & Sogir, S. (2022). Classification of Iris Flower Dataset using Different Algorithms. 9, 1–10. https://doi.org/ 10.26438/ijsrmss/v9i6.110
  7. Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), Article 1. https://doi.org/10.1038/s41598-022-10358-x

The k-Nearest Neighbour (k-NN) algorithm is a simple and intuitive classification algorithm used for pattern recognition and classification tasks. This research paper aims to address a gap in literature by exploring the relationship between sample size and the performance of the k-Nearest Neighbour (k-NN) algorithm. Through intensive experimental analysis of secondary data, we investigate how varying sample sizes influence the algorithm’s classification accuracy, computational efficiency, and generalization capabilities. Our findings reveal that an ideal scope for sample sizes is >190, with minimal differing results beyond that point. The maximum of the graph is 340, suggesting it to be the optimal value for ideal accuracy for this training model and scope. These results contribute to a deeper understanding of the proper application of the k-NN. These findings contribute to a deeper understanding of the complex interplay between sample sizes and k NN algorithm performance, aiding practitioners in making informed decisions when employing this method in realworld applications, and suggest the ideal value for sample size.

Keywords : Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe