Stroke Prediction using Machine Learning


Authors : Divya P.; Ajmal Ahamed R.; Duraiarasi S.; Jaisuriya N.; Jayashree A.

Volume/Issue : Volume 10 - 2025, Issue 2 - February


Google Scholar : https://tinyurl.com/zrbvju3e

Scribd : https://tinyurl.com/58v9h5t5

DOI : https://doi.org/10.5281/zenodo.14944955


Abstract : In this work, we aimed to predict the incidence of strokes using machine learning approaches. The dataset includes demographic and health-related variables such as age, gender, heart disease, hypertension, and smoking status. After pre- processing the data, which included encoding categorical variables and handling missing values, we trained several classification techniques, including Random Forest Classifier, AdaBoost Classifier, and Gradient Boosting Classifier. To evaluate the models' performance, we employed metrics such as F1-score, recall, accuracy, and precision. The Random Forest Classifier achieved the highest accuracy of 94.72% on the test set. In order to solve the issue of class imbalance, we also employed techniques like Random Over Sampler and SMOTE (Synthetic Minority Over-Sampling Technique), which improved the models' capacity to predict the recurrence of strokes. All things considered, our findings suggest that machine learning algorithms, with the Random Forest Classifier showing promising accuracy results, may be able to predict the incidence of strokes based on demographic and health-related data.

Keywords : Brain Stroke, Cerebrovascular Accident, Oxygen and Nutrients, Ischemic Stroke.

References :

  1. N. Hatami, L. Mechtouff, D. Rousseau, T.-H. Cho, O. Eker, Y. Berthezene, and C. Frindel, "A Novel Autoencoders-LSTM Model for Stroke Outcome Prediction using Multimodal MRI Data," arXiv preprint arXiv:2303.09484, March 2023.
  2. L. García-Terriza, J. L. Risco-Martín, G. Reig Roselló, and J. L. Ayala, "Predictive and diagnosis models of stroke from hemodynamic signal monitoring," arXiv preprint arXiv:2306.05289, May 2023.
  3. M. Bahrami and M. Forouzanfar, ‘‘Sleep apnea detection from single-lead ECG: A comprehensive analysis of machine learning and deep learning algorithms,’’ IEEE Trans. Instrum. Meas., vol. 71, pp. 1–11, 2022
  4. L. Ismail and H. Materwala, "From Conception to Deployment: Intelligent Stroke Prediction Framework using Machine Learning and Performance Evaluation," arXiv preprint arXiv:2304.00249, April 2023.
  5. S. H. Lee, C. S. Chan, S. J. Mayo, and P. Remagnino, "An Exploration on the Machine-Learning-Based Stroke Prediction Model," Frontiers in Neurology, vol. 15, p. 1372431, 2024
  6. F. Ren, W. Liu, and G. Wu, "Using an Interpretable Classifier to Predict Stroke Risk," IEEE Access, vol. 7, pp. 122758–122768, 2019.
  7. D. Lai, ‘‘Prognosis of sleep bruxism using power spectral density approach applied on EEG signal of both EMG1-EMG2 and ECG1- ECG2 channels,’’ IEEE Access, vol. 7, pp. 82553–82562, 2019,
  8. T.-Y. Kim and S.-B. Cho, ‘‘Predicting residential energy consumption using CNN-LSTM neural networks,’’ Energy, vol. 182, no. 1, pp. 72–81, Sep. 2019
  9. F. Rundo, S. Conoci, A. Ortis, and S. Battiato, ‘‘An advanced bio-inspired photoplethysmography (PPG) and ECG pattern recognition system for medical assessment,’’ Sensors, vol. 18, no. 2, pp. 1–22, Jan. 2018.

In this work, we aimed to predict the incidence of strokes using machine learning approaches. The dataset includes demographic and health-related variables such as age, gender, heart disease, hypertension, and smoking status. After pre- processing the data, which included encoding categorical variables and handling missing values, we trained several classification techniques, including Random Forest Classifier, AdaBoost Classifier, and Gradient Boosting Classifier. To evaluate the models' performance, we employed metrics such as F1-score, recall, accuracy, and precision. The Random Forest Classifier achieved the highest accuracy of 94.72% on the test set. In order to solve the issue of class imbalance, we also employed techniques like Random Over Sampler and SMOTE (Synthetic Minority Over-Sampling Technique), which improved the models' capacity to predict the recurrence of strokes. All things considered, our findings suggest that machine learning algorithms, with the Random Forest Classifier showing promising accuracy results, may be able to predict the incidence of strokes based on demographic and health-related data.

Keywords : Brain Stroke, Cerebrovascular Accident, Oxygen and Nutrients, Ischemic Stroke.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe