Predictive Modeling of Cyber Incident Escalation Risk in Hospital Electronic Medical Record (EMR) Systems UsingEnsemble Learning Models


Authors : Genevieve Donkor Armah; Idoko Peter Idoko; Yewande Iyimide Adeyeye; Lawrence Anebi Enyejo; Tony Isioma Azonuche

Volume/Issue : Volume 11 - 2026, Issue 2 - February


Google Scholar : https://tinyurl.com/2r28ez3k

Scribd : https://tinyurl.com/dy673tpk

DOI : https://doi.org/10.38124/ijisrt/26feb578

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Hospital Electronic Medical Record (EMR) systems constitute mission-critical clinical infrastructure whose compromise can directly disrupt care delivery, threaten patient safety, and expose healthcare organizations to regulatory and financial risk. Contemporary security operations in hospitals remain largely reactive, relying on static severity scoring and post-incident classification that provide limited support for anticipating whether an observed cyber event will escalate into a high-impact incident. This study addresses this gap by proposing a predictive, time-bounded framework for modeling cyber incident escalation risk in EMR environments using ensemble learning methods. We develop a retrospective observational study leveraging multi-source, PHI-safe operational telemetry, including security alerts, identity and access logs, endpoint and network signals, EMR audit metadata, and incident management records. Escalation is formalized as a forward-looking outcome defined by severity reclassification, containment intensity, operational downtime, or confirmed data compromise within specified horizons (6, 24, and 72 hours). A comprehensive feature engineering strategy integrates event-level indicators, identity anomalies, endpoint behaviors, network propagation signals, and EMR workflow context. Multiple ensemble models bagging, boosting, and stacking are evaluated against baseline approaches under severe class imbalance, with emphasis on probability calibration and decision-aligned metrics. Results demonstrate that ensemble models substantially outperform baselines in identifying escalation-prone events, particularly at short and medium horizons, while calibrated probabilities enable actionable threshold-based triage. Explainability analysis reveals that escalation risk is driven by the interaction of identity misuse, lateral movement, privilege changes, and anomalous EMR access patterns rather than isolated signals. Operational case studies show how probabilistic escalation forecasting can reduce time-tocontainment and unnecessary disruptions when embedded within human-in-the-loop security workflows. The study contributes an escalation-focused modeling framework, interpretable risk signals aligned with clinical operations, and deployment guidance for hospital security operations. Overall, the findings demonstrate that calibrated ensemble learning can meaningfully enhance proactive cyber risk management in EMR systems when integrated with disciplined governance and incident response practices.

Keywords : Electronic Medical Records (EMR); Healthcare Cybersecurity; Incident Escalation Risk; Ensemble Learning; Predictive Analytics; Security Operations Center (SOC); Ransomware.

References :

  1. Abu-Rabia, A., et al. (2026). Decision-aware trust signal alignment for SOC alert triage. arXiv.
  2. Adler-Milstein, J., & Huckman, R. S. (2013). The impact of electronic health record use on physician productivity. The American Journal of Managed Care, 19(10), SP345–SP352.
  3. Aluso, L. (2021). Forecasting marketing ROI through cross-platform data integration between HubSpot CRM and Power BI. International Journal of Scientific Research in Science, Engineering and Technology, 8(6), 356–378. https://doi.org/10.32628/IJSRSET214420
  4. Aluso, L., Enyejo, J. O., Amebleh, J., & Balogun, S. A. (2024). A comparative analysis of SQL-based and cloud-native data warehousing architectures for real-time financial reporting. International Journal of Scientific Research and Modern Technology, 3(12), 78–90. https://doi.org/10.38124/ijsrmt.v3i12.1179
  5. Aluso, L., Kpogli, S. A., & Enyejo, J. O. (2026). Predictive analytics for educational equity: A machine learning approach to identifying learning gaps in low-resource schools. International Journal of Recent Research in Interdisciplinary Sciences, 13(1), 12–26. https://doi.org/10.5281/zenodo.18390393
  6. Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(1), 310. https://doi.org/10.1186/s12911-020-01332-6
  7. Anim-Sampong, S. D., Ilesanmi, M. O., & Yetunde Adetutu, O. O. (2022). Bridging the gap between technical asset management and executive strategy in renewable energy: A framework for portfolio managers as policy and investment influencers. International Journal of Scientific Research in Mechanical and Materials Engineering, 6(5). http://doi.org/10.32628/IJSRMME18211
  8. Animasaun, J. B., Ijiga, O. M., Ayoola, V. B., & Enyejo, L. A. (2025). Improving RT-PCR detection accuracy for respiratory virus transmission network (RVTN) models through optimized RNA extraction protocols under CDC biosafety guidelines. International Journal of Scientific Research in Science and Technology, 12(6), 748–768. https://doi.org/10.32628/IJSRST25126501
  9. Animasaun, J. B., Ijiga, O. M., Ayoola, V. B., & Enyejo, L. A. (2026). Application of FT-IR (IS50 ATR) spectroscopy for differentiating hemp stem and bud chemical composition: A rapid screening approach. Chemistry & Material Sciences Research Journal, 5(1). https://doi.org/10.51594/cmsrj.v5i1
  10. Animasaun, J. B., Ijiga, O. M., Ayoola, V. B., & Enyejo, L. A. (2026). Development of a rapid GC-MS workflow for simultaneous quantification of volatile terpenes and cannabinoids in industrial hemp extracts. International Journal of Innovative Science and Research Technology, 11(1), 1155–1168. https://doi.org/10.38124/ijisrt/26jan752
  11. Anokwuru, E. A. (2024). Leveraging AI-enhanced commercial insights for precision marketing in the biopharmaceutical industry. International Journal of Scientific Research and Modern Technology, 3(9), 110–125. https://doi.org/10.38124/ijsrmt.v3i9.1204
  12. Anokwuru, E. A., & Enyejo, J. O. (2025). Predictive modeling for portfolio risk assessment in multi-therapeutic pharmaceutical enterprises. International Journal of Innovative Science and Research Technology, 10(11), 2354–2370. https://doi.org/10.38124/ijisrt/25nov1475
  13. Anokwuru, E. A., & Azonuche, T. I. (2026). Agile product development in healthcare innovation pipelines: Measuring efficiency gains through iterative data science integration. International Journal of Innovative Science and Research Technology, 11(1), 1656–1668. https://doi.org/10.38124/ijisrt/26jan979
  14. Appari, A., & Johnson, M. E. (2010). Information security and privacy in healthcare: Current state of research. International Journal of Internet and Enterprise Management, 6(4), 279–314. https://doi.org/10.1504/IJIEM.2010.035624
  15. Argaw, S. T., Troncoso-Pastoriza, J. R., Lacey, D., Florin, M.-V., Calcavecchia, F., Anderson, D., … Flahault, A. (2020). Cybersecurity of hospitals: Discussing the challenges and working towards mitigating the risks. BMC Medical Informatics and Decision Making, 20(1), 146. https://doi.org/10.1186/s12911-020-01161-7
  16. Awolola, O. J., Azonuche, T. I., Enyejo, J. O., Ononiwu, M., & Ayoola, V. B. (2025). Innovation-focused business models for scaling small and medium-sized engineering firms through technology adoption and process standardization. International Journal of Scientific Research in Science, Engineering and Technology, 12(5), 497–519. https://doi.org/10.32628/IJSRSET25125416
  17. Beaman, J., Bowers, J., & Goren, J. (2021). Ransomware attacks on hospitals: Impacts and mitigation strategies. Journal of Healthcare Risk Management, 41(1), 8–15. https://doi.org/10.1002/jhrm.21462
  18. Behl, A., & Behl, K. (2017). Cyberwar: The next threat to national security and what to do about it. Oxford University Press.
  19. Biggio, B., & Roli, F. (2018). Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84, 317–331. https://doi.org/10.1016/j.patcog.2018.07.023
  20. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  21. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2
  22. Buczak, A. L., & Guven, E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), 1153–1176. https://doi.org/10.1109/COMST.2015.2494502
  23. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
  24. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
  25. Cichonski, P., Millar, T., Grance, T., & Scarfone, K. (2012). Computer security incident handling guide (NIST SP 800-61 Rev. 2). National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800-61r2
  26. Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Chapman & Hall/CRC.
  27. Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 973–978.
  28. ENISA. (2016). Communication network dependencies for ICS/SCADA systems. European Union Agency for Network and Information Security.
  29. ENISA. (2023). Threat landscape for the health sector. European Union Agency for Cybersecurity. https://www.enisa.europa.eu
  30. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. https://doi.org/10.1006/jcss.1997.1504
  31. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
  32. Gordon, W. J., Fairhall, A., & Landman, A. (2021). Threats to information security—Public health implications. The New England Journal of Medicine, 384(14), 1297–1299. https://doi.org/10.1056/NEJMp2101646
  33. HealthIT.gov. (2019). What are electronic medical records (EMRs)? Office of the National Coordinator for Health Information Technology. https://www.healthit.gov
  34. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
  35. Hersh, W., et al. (2018). Using EHR audit trail logs to analyze clinical workflow. Journal of the American Medical Informatics Association.
  36. Kim, S., Lou, S. S., & Baratta, L. R. (2023). Classifying clinical work settings using EHR audit logs: A machine learning approach. The American Journal of Managed Care.
  37. Kruse, C. S., Frederick, B., Jacobson, T., & Monticone, D. K. (2017). Cybersecurity in healthcare: A systematic review of modern threats and trends. Technology and Health Care, 25(1), 1–10. https://doi.org/10.3233/THC-161263
  38. Kruse, C. S., Kristof, C., Jones, B., Mitchell, E., & Martinez, A. (2017). Barriers to electronic health record adoption: A systematic literature review. Journal of Medical Systems, 40(12), 252. https://doi.org/10.1007/s10916-016-0628-9
  39. Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, 2980–2988. https://doi.org/10.1109/ICCV.2017.324
  40. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 4765–4774.
  41. McLeod, A., & Dolezel, D. (2018). Cyber-analytics: Modeling factors associated with healthcare data breaches. Decision Support Systems, 108, 57–68. https://doi.org/10.1016/j.dss.2018.02.002
  42. Molnar, C. (2022). Interpretable machine learning (2nd ed.). Lulu Press.
  43. Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 61–74.
  44. Rieke, N., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 119. https://doi.org/10.1038/s41746-020-00323-1
  45. Saito, T., & Rehmsmeier, M. (2015). The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
  46. Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. IEEE Symposium on Security and Privacy, 305–316. https://doi.org/10.1109/SP.2010.25
  47. Taddeo, M., & Floridi, L. (2018). Regulate artificial intelligence to avert cyber arms race. Nature, 556(7701), 296–298. https://doi.org/10.1038/d41586-018-04602-6
  48. Tariq, S., et al. (2025). Alert fatigue in security operations centres: Research challenges and opportunities. ACM Computing Surveys.
  49. Tonekaboni, S., Joshi, S., McCradden, M. D., & Goldenberg, A. (2019). What clinicians want: Contextualizing explainable machine learning for clinical end use. Proceedings of the Machine Learning for Healthcare Conference, 359–380.
  50. Tounsi, W., & Rais, H. (2018). A survey on technical threat intelligence in the age of sophisticated cyber attacks. Computers & Security, 72, 212–233. https://doi.org/10.1016/j.cose.2017.09.001
  51. World Health Organization. (2021). Cybersecurity in health: Challenges and opportunities. https://www.who.int
  52. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Proceedings of the ACM SIGKDD Conference, 694–699. https://doi.org/10.1145/775047.775151

Hospital Electronic Medical Record (EMR) systems constitute mission-critical clinical infrastructure whose compromise can directly disrupt care delivery, threaten patient safety, and expose healthcare organizations to regulatory and financial risk. Contemporary security operations in hospitals remain largely reactive, relying on static severity scoring and post-incident classification that provide limited support for anticipating whether an observed cyber event will escalate into a high-impact incident. This study addresses this gap by proposing a predictive, time-bounded framework for modeling cyber incident escalation risk in EMR environments using ensemble learning methods. We develop a retrospective observational study leveraging multi-source, PHI-safe operational telemetry, including security alerts, identity and access logs, endpoint and network signals, EMR audit metadata, and incident management records. Escalation is formalized as a forward-looking outcome defined by severity reclassification, containment intensity, operational downtime, or confirmed data compromise within specified horizons (6, 24, and 72 hours). A comprehensive feature engineering strategy integrates event-level indicators, identity anomalies, endpoint behaviors, network propagation signals, and EMR workflow context. Multiple ensemble models bagging, boosting, and stacking are evaluated against baseline approaches under severe class imbalance, with emphasis on probability calibration and decision-aligned metrics. Results demonstrate that ensemble models substantially outperform baselines in identifying escalation-prone events, particularly at short and medium horizons, while calibrated probabilities enable actionable threshold-based triage. Explainability analysis reveals that escalation risk is driven by the interaction of identity misuse, lateral movement, privilege changes, and anomalous EMR access patterns rather than isolated signals. Operational case studies show how probabilistic escalation forecasting can reduce time-tocontainment and unnecessary disruptions when embedded within human-in-the-loop security workflows. The study contributes an escalation-focused modeling framework, interpretable risk signals aligned with clinical operations, and deployment guidance for hospital security operations. Overall, the findings demonstrate that calibrated ensemble learning can meaningfully enhance proactive cyber risk management in EMR systems when integrated with disciplined governance and incident response practices.

Keywords : Electronic Medical Records (EMR); Healthcare Cybersecurity; Incident Escalation Risk; Ensemble Learning; Predictive Analytics; Security Operations Center (SOC); Ransomware.

Paper Submission Last Date
31 - March - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe