Development of Random Forest Model for Stroke Prediction


Authors : Nnanna, Chidera Egegamuka; Nnanna, Ekedebe; Ajoku, Kingsley Kelechi; Okafor, Chidozie Raymond Patrick; Ozor, Chidinma C

Volume/Issue : Volume 9 - 2024, Issue 4 - April

Google Scholar : https://tinyurl.com/2a3u5kn8

Scribd : https://tinyurl.com/4rtxzs6x

DOI : https://doi.org/10.38124/ijisrt/IJISRT24APR2566

Abstract : Stroke is a significant cause of mortality and morbidity worldwide, and early detection and prevention of stroke are essential for improving patient outcomes. Machine learning algorithms have been used in recent years to predict the risk of stroke by leveraging large amounts of clinical and demographic data. The development of a stroke prediction system using Random Forest machine learning algorithm is the main objective of this thesis. The primary goal of the project is to increase the accuracy of stroke detection while addressing the shortcomings of the current system, which include real- time deployment and interpretability issues with logistic regression. The development and use of an ensemble machine learning-based stroke prediction system, performance optimization through the use of ensemble machine learning algorithms, performance assessment, and real-time model deployment through the use of Python Django are among the goals of the research. The study's potential to improve public health by lessening the severity and consequences of strokes through early diagnosis and treatment makes it significant. Data collection, preprocessing, model selection, evaluation, and real-time deployment using Python Django are all part of the research technique. Our dataset consists of 5110 rows of tuples and columns with total size of 69kg. The performance of our stroke prediction algorithm was evaluated using confusion metrics-consisting of accuracy, precision, recall and F1-score. At the end of the research, Random Forest model gave an accuracy of 98.5% compared to the existing model logistic regression which has 86% accuracy.

Keywords : Machine Learning Algorithms, Preporcessing, Random Forest Model, Confusion Matrix, F-Score Measurement, Stroke Prediction.

References :

  1. L. K. Ursell, J. L. Metcalf, L. W. Parfrey, and R. Knight, “Defining the human microbiota,” Nutr. Rev., vol. 70, no. suppl_1, pp. S38–S44, Aug. 2012, doi: 10.1111/j.1753-4887.2012.00493.x.
  2. A.-K. Am, F. W, H. J, M. F, J. Sa, and K. Jm, “Gut Microbiota and Atrial Fibrillation: Pathogenesis, Mechanisms and Therapies,” Arrhythmia Electrophysiol. Rev., vol. 12, Apr. 2023, doi: 10.15420/aer.2022.33.
  3. N. Hasan and H. Yang, “Factors affecting the composition of the gut microbiota, and its modulation,” PeerJ, vol. 7, p. e7502, Aug. 2019, doi: 10.7717/peerj.7502.
  4. T. Pj, L. Re, H. M, F.-L. Cm, K. R, and G. Ji, “The human microbiota project,” Nature, vol. 449, no. 7164, Oct. 2007, doi: 10.1038/nature06244.
  5. P. J et al., “The NIH Human Microbiota Project,” Genome Res., vol. 19, no. 12, Dec. 2009, doi: 10.1101/gr.096651.109.
  6. A. Leewenhoeck, “An abstract of a letter from Mr. Anthony Leevvenhoeck at Delft, dated Sep. 17. 1683. Containing some microscopical observations, about animals in the scurf of the teeth, the substance called worms in the nose, the cuticula consisting of scales,” Philos. Trans. R. Soc. Lond., vol. 14, no. 159, pp. 568–574, Jan. 1997, doi: 10.1098/rstl.1684.0030.
  7. S. K. Mazmanian, C. H. Liu, A. O. Tzianabos, and D. L. Kasper, “An Immunomodulatory Molecule of Symbiotic Bacteria Directs Maturation of the Host Immune System,” Cell, vol. 122, no. 1, pp. 107–118, Jul. 2005, doi: 10.1016/j.cell.2005.05.007.
  8. F. Magne et al., “The Firmicutes/Bacteroidetes Ratio: A Relevant Marker of Gut Dysbiosis in Obese Patients?,” Nutrients, vol. 12, no. 5, Art. no. 5, May 2020, doi: 10.3390/nu12051474.
  9. A. Hiergeist, J. Gläsner, U. Reischl, and A. Gessner, “Analyses of Intestinal Microbiota: Culture versus Sequencing,” ILAR J., vol. 56, no. 2, pp. 228–240, Aug. 2015, doi: 10.1093/ilar/ilv017.
  10. J. S. Johnson et al., “Evaluation of 16S rRNA gene sequencing for species and strain-level microbiota analysis,” Nat. Commun., vol. 10, no. 1, p. 5029, Nov. 2019, doi: 10.1038/s41467-019-13036-1.
  11.  “Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases | Clinical Microbiology Reviews.” Accessed: May 08, 2024. [Online]. Available: https://journals.asm.org/doi/10.1128/cmr. 17.4.840-862.2004
  12. T. J. Sharpton, “An introduction to the analysis of shotgun metagenomic data,” Front. Plant Sci., vol. 5, Jun. 2014, doi: 10.3389/fpls.2014.00209.
  13.  “Metabolomic Profiling for Diagnosis and Prognostication in S... : Annals of Surgery.” Accessed: May 08, 2024. [Online]. Available: https://journals.lww.com/annalsofsurgery/abstract/2021/02000/metabolomic_profiling_for_diagnosis_and.12.aspx
  14.  “Metabolomics by Gas Chromatography–Mass Spectrometry: Combined Targeted and Untargeted Profiling - Fiehn - 2016 - Current Protocols in Molecular Biology - Wiley Online Library.” Accessed: May 08, 2024. [Online]. Available: https://currentprotocols.onlinelibrary.wiley.com/doi/10.1002/0471142727.mb3004s114
  15. A.-H. Emwas et al., “NMR Spectroscopy for Metabolomics Research,” Metabolites, vol. 9, no. 7, Art. no. 7, Jul. 2019, doi: 10.3390/metabo9070123.
  16. S. H. Ralston, I. D. Penman, M. W. J. Strachan, and R. Hobson, “Davidson’s Principles and Practice of Medicine”.
  17. V. Markides and R. J. Schilling, “Atrial fibrillation: classification, pathophysiology, mechanisms and drug treatment,” Heart, vol. 89, no. 8, pp. 939–943, Aug. 2003, Accessed: May 08, 2024. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1767799/
  18. J. Newbury et al., “Stroke Epidemiology in an Australian Rural Cohort (SEARCH),” Int. J. Stroke Off. J. Int. Stroke Soc., vol. 12, no. 2, pp. 161–168, Feb. 2017, doi: 10.1177/1747493016670174.
  19. Z. Nesheiwat, A. Goyal, and M. Jagtap, “Atrial Fibrillation,” in StatPearls, Treasure Island (FL): StatPearls Publishing, 2024. Accessed: May 08, 2024. [Online]. Available: http://www.ncbi.nlm.nih.gov/ books/NBK526072/
  20. L. Yu et al., “A potential relationship between gut microbes and atrial fibrillation: Trimethylamine N-oxide, a gut microbe-derived metabolite, facilitates the progression of atrial fibrillation,” Int. J. Cardiol., vol. 255, pp. 92–98, Mar. 2018, doi: 10.1016/j.ijcard. 2017.11.071.
  21. A. Eisen et al., “Sudden Cardiac Death in Patients With Atrial Fibrillation: Insights From the ENGAGE AF‐TIMI 48 Trial,” J. Am. Heart Assoc., vol. 5, no. 7, p. e003735, doi: 10.1161/JAHA.116.003735.
  22. D. D. McManus, M. Rienstra, and E. J. Benjamin, “An Update on the Prognosis of Patients With  Atrial Fibrillation,” Circulation, vol. 126, no. 10, pp. e143–e146, Sep. 2012, doi: 10.1161/ CIRCULATIONAHA. 112.129759.
  23. V. Mertz et al., “Prognosis of Atrial Fibrillation with or without Comorbidities: Analysis of Younger Adults from a Nationwide Database,” J. Clin. Med., vol. 11, no. 7, p. 1981, Apr. 2022, doi: 10.3390/jcm11071981.
  24. D. R. Van Wagoner and M. K. Chung, “Inflammation, Inflammasome Activation, and Atrial Fibrillation,” Circulation, vol. 138, no. 20, pp. 2243–2246, Nov. 2018, doi: 10.1161/CIRCULATIONAHA.118. 036143.
  25. Z. Y et al., “Gut microbiota dysbiosis promotes age-related atrial fibrillation by lipopolysaccharide and glucose-induced activation of NLRP3-inflammasome,” Cardiovasc. Res., vol. 118, no. 3, Feb. 2022, doi: 10.1093/cvr/cvab114.
  26. C. Yao et al., “Enhanced Cardiomyocyte NLRP3 Inflammasome Signaling Promotes Atrial Fibrillation,” Circulation, vol. 138, no. 20, pp. 2227–2242, Nov. 2018, doi: 10.1161/CIRCULATIONAHA. 118.035202.
  27. K. D et al., “The emerging role of gut microbiota in cardiovascular diseases,” Indian Heart J., vol. 73, no. 3, Jun. 2021, doi: 10.1016/j.ihj.2021.04.008.
  28. Z. K et al., “Disordered gut microbiota and alterations in metabolic patterns are associated with atrial fibrillation,” GigaScience, vol. 8, no. 6, Jun. 2019, doi: 10.1093/gigascience/giz058.
  29. Z. K et al., “Different Types of Atrial Fibrillation Share Patterns of Gut Microbiota Dysbiosis,” mSphere, vol. 5, no. 2, Mar. 2020, doi: 10.1128/mSphere.00071-20.
  30. K. Huang et al., “Gut Microbiota and Metabolites in Atrial Fibrillation Patients and Their Changes after Catheter Ablation,” Microbiol. Spectr., vol. 10, no. 2, pp. e01077-21, Apr. 2022, doi: 10.1128/spectrum.01077-21.
  31. S. Yang et al., “Gut Microbiota-Dependent Marker TMAO in Promoting Cardiovascular Disease: Inflammation Mechanism, Clinical Prognostic, and Potential as a Therapeutic Target,” Front. Pharmacol., vol. 10, Nov. 2019, doi: 10.3389/fphar.2019.01360.
  32. S. H. Zeisel and K.-A. da Costa, “Choline: An Essential Nutrient for Public Health,” Nutr. Rev., vol. 67, no. 11, p. 615, Nov. 2009, doi: 10.1111/j.1753-4887.2009.00246.x.
  33. Z. Wang et al., “Gut flora metabolism of phosphatidylcholine promotes cardiovascular disease,” Nature, vol. 472, no. 7341, pp. 57–63, Apr. 2011, doi: 10.1038/nature09922.
  34.  “The transformation of atrial fibroblasts into myofibroblasts is promoted by trimethylamine N-oxide via the Wnt3a/β-catenin signaling pathway - Yang - Journal of Thoracic Disease.” Accessed: May 10, 2024. [Online]. Available: https://jtd.amegroups.org/article/view/64805/html
  35. Z. Li et al., “Gut microbe-derived metabolite trimethylamine N-oxide induces cardiac hypertrophy and fibrosis,” Lab. Invest., vol. 99, no. 3, pp. 346–357, Mar. 2019, doi: 10.1038/s41374-018-0091-y.
  36. B. Bertani and N. Ruiz, “Function and Biogenesis of Lipopolysaccharides,” EcoSal Plus, vol. 8, no. 1, p. 10.1128/ecosalplus.ESP-0001–2018, Aug. 2018, doi: 10.1128/ecosalplus.esp-0001-2018.
  37. R. Okazaki et al., “Lipopolysaccharide Induces Atrial Arrhythmogenesis via Down-Regulation of L-Type Ca2+ Channel Genes in Rats,” Int. Heart. J., vol. 50, no. 3, pp. 353–363, 2009, doi: 10.1536/ihj.50.353.
  38. Y.-Y. Chen et al., “α-adrenoceptor-mediated enhanced inducibility of atrial fibrillation in a canine system inflammation model,” Mol. Med. Rep., vol. 15, no. 6, pp. 3767–3774, Jun. 2017, doi: 10.3892/mmr.2017.6477.
  39. W. H. W. Tang, D. Y. Li, and S. L. Hazen, “Dietary metabolism, the gut microbiota, and heart failure,” Nat. Rev. Cardiol., vol. 16, no. 3, pp. 137–154, Mar. 2019, doi: 10.1038/s41569-018-0108-7.
  40.  “Regulation of microglial inflammatory response by sodium butyrate and short‐chain fatty acids - Huuskonen - 2004 - British Journal of Pharmacology - Wiley Online Library.” Accessed: May 10, 2024. [Online]. Available: https://bpspubs.onlinelibrary.wiley.com/doi/10.1038/sj.bjp.0705682
  41. N. Natarajan et al., “Microbial short chain fatty acid metabolites lower blood pressure via endothelial G protein-coupled receptor 41,” Physiol. Genomics, vol. 48, no. 11, pp. 826–834, Nov. 2016, doi: 10.1152/physiolgenomics.00089.2016.
  42. J. Park et al., “Short-chain fatty acids induce both effector and regulatory T cells by suppression of histone deacetylases and regulation of the mTOR–S6K pathway,” Mucosal Immunol., vol. 8, no. 1, pp. 80–93, Jan. 2015, doi: 10.1038/mi.2014.44.
  43. K. Kasahara et al., “Interactions between Roseburia intestinalis and diet modulate atherogenesis in a murine model,” Nat. Microbiol., vol. 3, no. 12, pp. 1461–1471, Dec. 2018, doi: 10.1038/s41564-018-0272-x.
  44. P.-C. Fan et al., “Serum indoxyl sulfate predicts adverse cardiovascular events in patients with chronic kidney disease,” J. Formos. Med. Assoc., vol. 118, no. 7, pp. 1099–1106, Jul. 2019, doi: 10.1016/j.jfma.2019.03.005.
  45. W.-T. Chen et al., “The Uremic Toxin Indoxyl Sulfate Increases Pulmonary Vein and Atrial Arrhythmogenesis,” J. Cardiovasc. Electrophysiol., vol. 26, no. 2, pp. 203–210, 2015, doi: 10.1111/jce.12554.
  46. S. Lekawanvijit, A. Adrahtas, D. J. Kelly, A. R. Kompa, B. H. Wang, and H. Krum, “Does indoxyl sulfate, a uraemic toxin, have direct effects on cardiac fibroblasts and myocytes?,” Eur. Heart J., vol. 31, no. 14, pp. 1771–1779, Jul. 2010, doi: 10.1093/eurheartj/ehp574.
  47. F. Yamagami et al., “Indoxyl Sulphate is Associated with Atrial Fibrillation Recurrence after Catheter Ablation,” Sci. Rep., vol. 8, no. 1, p. 17276, Nov. 2018, doi: 10.1038/s41598-018-35226-5.
  48. H. Koike et al., “The relationship between serum indoxyl sulfate and the renal function after catheter ablation of atrial fibrillation in patients with mild renal dysfunction,” Heart Vessels, vol. 34, no. 4, pp. 641–649, Apr. 2019, doi: 10.1007/s00380-018-1288-0.
  49. M. S. Desai and D. J. Penny, “Bile acids induce arrhythmias: old metabolite, new tricks,” Heart, vol. 99, no. 22, pp. 1629–1630, Nov. 2013, doi: 10.1136/heartjnl-2013-304546.
  50. X. Wang, Z. Li, M. Zang, T. Yao, J. Mao, and J. Pu, “Circulating primary bile acid is correlated with structural remodeling in atrial fibrillation,” J. Interv. Card. Electrophysiol., vol. 57, no. 3, pp. 371–377, Apr. 2020, doi: 10.1007/s10840-019-00540-z.
  51. A. Alonso et al., “Metabolomics and Incidence of Atrial Fibrillation in African Americans: The Atherosclerosis Risk in Communities (ARIC) Study,” PLOS ONE, vol. 10, no. 11, p. e0142610, Nov. 2015, doi: 10.1371/journal.pone.0142610.
  52. S. H. S. A. Kadir et al., “Bile Acid-Induced Arrhythmia Is Mediated by Muscarinic M2 Receptors in Neonatal Rat Cardiomyocytes,” PLOS ONE, vol. 5, no. 3, p. e9689, Mar. 2010, doi: 10.1371/journal.pone.0009689.

Stroke is a significant cause of mortality and morbidity worldwide, and early detection and prevention of stroke are essential for improving patient outcomes. Machine learning algorithms have been used in recent years to predict the risk of stroke by leveraging large amounts of clinical and demographic data. The development of a stroke prediction system using Random Forest machine learning algorithm is the main objective of this thesis. The primary goal of the project is to increase the accuracy of stroke detection while addressing the shortcomings of the current system, which include real- time deployment and interpretability issues with logistic regression. The development and use of an ensemble machine learning-based stroke prediction system, performance optimization through the use of ensemble machine learning algorithms, performance assessment, and real-time model deployment through the use of Python Django are among the goals of the research. The study's potential to improve public health by lessening the severity and consequences of strokes through early diagnosis and treatment makes it significant. Data collection, preprocessing, model selection, evaluation, and real-time deployment using Python Django are all part of the research technique. Our dataset consists of 5110 rows of tuples and columns with total size of 69kg. The performance of our stroke prediction algorithm was evaluated using confusion metrics-consisting of accuracy, precision, recall and F1-score. At the end of the research, Random Forest model gave an accuracy of 98.5% compared to the existing model logistic regression which has 86% accuracy.

Keywords : Machine Learning Algorithms, Preporcessing, Random Forest Model, Confusion Matrix, F-Score Measurement, Stroke Prediction.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe