Authors :
Saeed H. A.; Naeem N. Alyona A. E.
Volume/Issue :
Volume 10 - 2025, Issue 2 - February
Google Scholar :
https://tinyurl.com/59r4zsvk
DOI :
https://doi.org/10.38124/ijisrt/25Feb1337
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Healthcare fraud is a fast-growing issue that causes substantial financial loss and affects the quality of patient
care. Conventional fraud detection techniques tend to be ineffective in detecting fraudulent claims because healthcare data
is complex and enormous in volume. This research investigates the use of machine learning methods to enhance fraud
detection within healthcare systems. We contrast the performance of Decision Tree, Random Forest, K-Nearest Neighbors
(KNN), and Logistic Regression both prior to and post-hyperparameter tuning and feature selection. Forward feature
selection was done with KNN and Logistic Regression to improve model performance by choosing the most salient features,
whereas hyperparameter tuning was utilized to fine-tune all the models. Metrics of evaluation like accuracy, precision, recall,
F1-score, confusion matrix, and ROC curves were employed to measure the effectiveness of the models. The outcome reveals
that Logistic Regression had the highest accuracy following optimization and feature selection over other models in
identifying fraudulent claims. The Voting Classifier, which is an ensemble learning, enhanced fraud detection by aggregating
various models for enhanced predictive capability. Though Decision Tree and Random Forest performed well, tuning was
not effective in improving their accuracy. These results indicate that machine learning methods, especially ensemble models
and feature selection, can dramatically improve healthcare fraud detection. Subsequent studies need to integrate deep
learning and advanced ensemble techniques to further enhance fraud detection accuracy and reduce false positives.
Keywords :
Healthcare Fraud, Machine Learning, Fraud Detection, Hyperparameter Tuning, Feature Selection, Ensemble Learning.
References :
- T. Fernando, H. Gammulle, S. Denman, S. Sridharan, and C. Fookes, “Deep learning for medical anomaly detection–a survey,” ACM Computing Surveys (CSUR), vol. 54, no. 7, pp. 1–37, 2021.
- J. M. Johnson and T. M. Khoshgoftaar, “Medicare fraud detection using neural networks,” J Big Data, vol. 6, no. 1, p. 63, 2019.
- L. Syed, S. Jabeen, S. Manimala, and H. A. Elsayed, “Data science algorithms and techniques for smart healthcare using IoT and big data analytics,” Smart techniques for a smarter planet: towards smarter algorithms, pp. 211–241, 2019.
- A. B. Nassif, M. A. Talib, Q. Nasir, and F. M. Dakalbab, “Machine learning for anomaly detection: A systematic review,” Ieee Access, vol. 9, pp. 78658–78700, 2021.
- T. Dissanayake, T. Fernando, S. Denman, S. Sridharan, H. Ghaemmaghami, and C. Fookes, “A robust interpretable deep learning classifier for heart anomaly detection without segmentation,” IEEE J Biomed Health Inform, vol. 25, no. 6, pp. 2162–2171, 2020.
- D. Ververidis and C. Kotropoulos, “Sequential forward feature selection with low computational cost,” in 2005 13th European Signal Processing Conference, IEEE, 2005, pp. 1–4.
- L. Li et al., “A system for massively parallel hyperparameter tuning,” Proceedings of Machine Learning and Systems, vol. 2, pp. 230–246, 2020.
- L. Connelly, “Logistic regression.,” Medsurg Nursing, vol. 29, no. 5, 2020.
- A. Y. B. R. Thaifur, M. A. Maidin, A. I. Sidin, and A. Razak, “How to detect healthcare fraud?‘A systematic review,’” Gac Sanit, vol. 35, pp. S441–S449, 2021.
- B. Mahesh, “Machine learning algorithms-a review,” International Journal of Science and Research (IJSR).[Internet], vol. 9, no. 1, pp. 381–386, 2020.
- Z. Zhang, “Introduction to machine learning: k-nearest neighbors,” Ann Transl Med, vol. 4, no. 11, 2016.
Healthcare fraud is a fast-growing issue that causes substantial financial loss and affects the quality of patient
care. Conventional fraud detection techniques tend to be ineffective in detecting fraudulent claims because healthcare data
is complex and enormous in volume. This research investigates the use of machine learning methods to enhance fraud
detection within healthcare systems. We contrast the performance of Decision Tree, Random Forest, K-Nearest Neighbors
(KNN), and Logistic Regression both prior to and post-hyperparameter tuning and feature selection. Forward feature
selection was done with KNN and Logistic Regression to improve model performance by choosing the most salient features,
whereas hyperparameter tuning was utilized to fine-tune all the models. Metrics of evaluation like accuracy, precision, recall,
F1-score, confusion matrix, and ROC curves were employed to measure the effectiveness of the models. The outcome reveals
that Logistic Regression had the highest accuracy following optimization and feature selection over other models in
identifying fraudulent claims. The Voting Classifier, which is an ensemble learning, enhanced fraud detection by aggregating
various models for enhanced predictive capability. Though Decision Tree and Random Forest performed well, tuning was
not effective in improving their accuracy. These results indicate that machine learning methods, especially ensemble models
and feature selection, can dramatically improve healthcare fraud detection. Subsequent studies need to integrate deep
learning and advanced ensemble techniques to further enhance fraud detection accuracy and reduce false positives.
Keywords :
Healthcare Fraud, Machine Learning, Fraud Detection, Hyperparameter Tuning, Feature Selection, Ensemble Learning.