Authors :
Zara Wong
Volume/Issue :
Volume 9 - 2024, Issue 5 - May
Google Scholar :
https://tinyurl.com/4uurzrsv
Scribd :
https://tinyurl.com/mrs9hewe
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24MAY517
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
The k-Nearest Neighbour (k-NN) algorithm is
a simple and intuitive classification algorithm used for
pattern recognition and classification tasks. This research
paper aims to address a gap in literature by exploring the
relationship between sample size and the performance of
the k-Nearest Neighbour (k-NN) algorithm. Through
intensive experimental analysis of secondary data, we
investigate how varying sample sizes influence the
algorithm’s classification accuracy, computational
efficiency, and generalization capabilities. Our findings
reveal that an ideal scope for sample sizes is >190, with
minimal differing results beyond that point. The
maximum of the graph is 340, suggesting it to be the
optimal value for ideal accuracy for this training model
and scope. These results contribute to a deeper
understanding of the proper application of the k-NN.
These findings contribute to a deeper understanding of
the complex interplay between sample sizes and k NN
algorithm performance, aiding practitioners in making
informed decisions when employing this method in realworld applications, and suggest the ideal value for sample
size.
Keywords :
Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.
References :
- Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbor classifier performance for heterogeneous data sets. SN Applied Sciences, 1(12), 1559. https://doi.org/10.1007/s42452-019-1356-9
- Trivedi, U. B., Bhatt, M., & Srivastava, P. (2021). Prevent Overfitting Problem in Machine Learning: A Case Focus on Linear Regression and Logistics Regression. In P. K. Singh, Z. Polkowski, S. Tanwar, S. K. Pandey, G. Matei, & D. Pirvu (Eds.), Innovations in Information and Communication Technologies (IICT-2020) (pp. 345–349). Springer International Publishing. https://doi.org/10.1007/978-3-030-66218-9_40
- Maia Polo, F., & Vicente, R. (2023). Effective sample size, dimensionality, and generalization in covariate shift adaptation. Neural Computing and Applications, 35(25), 18187–18199. https://doi.org/10.1007/s 00521-021-06615-1
- Helm, J. M., Swiergosz, A. M., Haeberle, H. S., Karnuta, J. M., Schaffer, J. L., Krebs, V. E., Spitzer, A. I., & Ramkumar, P. N. (2020). Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Current Reviews in Musculoskeletal Medicine, 13(1), 69–76. https://doi.org/10.1007/s12178-020-09600-8
- Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/10.1007/s 12525-021-00475-2
- Mithy, S., Hossain, S., Akter, S., . U. H., & Sogir, S. (2022). Classification of Iris Flower Dataset using Different Algorithms. 9, 1–10. https://doi.org/ 10.26438/ijsrmss/v9i6.110
- Uddin, S., Haque, I., Lu, H., Moni, M. A., & Gide, E. (2022). Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Scientific Reports, 12(1), Article 1. https://doi.org/10.1038/s41598-022-10358-x
The k-Nearest Neighbour (k-NN) algorithm is
a simple and intuitive classification algorithm used for
pattern recognition and classification tasks. This research
paper aims to address a gap in literature by exploring the
relationship between sample size and the performance of
the k-Nearest Neighbour (k-NN) algorithm. Through
intensive experimental analysis of secondary data, we
investigate how varying sample sizes influence the
algorithm’s classification accuracy, computational
efficiency, and generalization capabilities. Our findings
reveal that an ideal scope for sample sizes is >190, with
minimal differing results beyond that point. The
maximum of the graph is 340, suggesting it to be the
optimal value for ideal accuracy for this training model
and scope. These results contribute to a deeper
understanding of the proper application of the k-NN.
These findings contribute to a deeper understanding of
the complex interplay between sample sizes and k NN
algorithm performance, aiding practitioners in making
informed decisions when employing this method in realworld applications, and suggest the ideal value for sample
size.
Keywords :
Systems Software; Algorithms; Machine Learning; k-Nearest Neighbour; Sample Size.