NLP Based Prediction of Hospital Readmission using ClinicalBERT and Clinician Notes


Authors : LMatondora; MMutandavari; BMupini

Volume/Issue : Volume 9 - 2024, Issue 7 - July


Google Scholar : https://tinyurl.com/msseyway

Scribd : https://tinyurl.com/2v8jh4hd

DOI : https://doi.org/10.38124/ijisrt/IJISRT24JUL1191

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : Hospital readmissions introduce a significant challenge in healthcare, leading to increased costs, reduced patient outcomes, and strained healthcare systems. Accurately predicting the risk of hospital readmission is crucial for implementing targeted interventions and improving patient care. This study investigates the use of natural language processing (NLP) techniques, specifically the ClinicalBERT model, to predict the risk of hospital readmission using the first 3-5 days of clinical notes, excluding discharge notes. We compare the performance of ClinicalBERT to other machine learning models, including logistic regression, random forest, and XGBoost, to identify the most effective approach for this task. This study highlights the potential of leveraging deep learning-based NLP models in the clinical domain to improve patient care and reduce the burden of hospital readmissions, even when utilizing only the initial clinical notes from a patient's hospitalization. It can also provide information early to allow Clinicians to intervene in patients who are at high risk. The results demonstrate that the ClinicalBERT model outperforms the other techniques, achieving higher accuracy, F1-score, and area under the receiver operating characteristic (ROC) curve. This study highlights the potential of leveraging deep learning- based NLP models in the clinical domain to improve patient care and reduce the burden of hospital readmissions.

Keywords : Hospital Readmission, Clinical Notes, ClinicalBERT, Deep learning,

References :

  1. S. Wang and X. Zhu, “Predictive Modeling of Hospital Readmission: Challenges and Solutions,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 19, no. 5, pp. 2975–2995, 2022, doi: 10.1109/TCBB. 2021.3089682.
  2. C. Xiao, T. Ma, A. B. Dieng, D. M. Blei, and F. Wang, “Readmission prediction via deep contextual embedding of clinical concepts,” PLoS One, vol. 13, no. 4, pp. 1–15, 2018, doi: 10.1371/journal.pone. 0195024.
  3. E. A. Coleman, “Rehospitalizations among Patients in the Medicare Fee-for-Service Program,” 2009.
  4. J. N. Epstein et al., “Variability in ADHD Care in Community-Based Pediatrics,” 2014, doi: 10.1542/ peds.2014-1500.
  5. J. Bravo, F. L. Buta, M. Talina, and A. Silva-dos-Santos, “Avoiding revolving door and homelessness: The need to improve care transition interventions in psychiatry and mental health,” Front. Psychiatry, vol. 13, 2022, doi: 10.3389/fpsyt.2022.1021926.
  6. O. Ben-Assuli and R. Padman, “Analysing repeated hospital readmissions using data mining techniques,” Heal. Syst., vol. 7, no. 2, pp. 120–134, 2018, doi: 10.1080/20476965.2017.1390635.
  7. S. Yelne, M. Chaudhary, K. Dod, A. Sayyad, and R. Sharma, “Harnessing the Power of AI: A Comprehensive Review of Its Impact and Challenges in Nursing Science and Healthcare,” Cureus, vol. 15, no. 11, 2023, doi: 10.7759/cureus.49252.
  8. K. Teo et al., “Current Trends in Readmission Prediction: An Overview of Approaches,” Arab. J. Sci. Eng., vol. 48, no. 8, pp. 11117–11134, 2023, doi: 10.1007/s13369-021-06040-5.
  9. Z. Al Nazi and W. Peng, “Large language models in healthcare and medical domain: A review,” 2023, [Online]. Available: http://arxiv.org/abs/2401.06775
  10. P. R. Pennathur and B. S. Ayres, “A qualitative investigation of healthcare workers’ strategies in response to readmissions,” BMC Health Serv. Res., vol. 18, no. 1, pp. 1–13, 2018, doi: 10.1186/s12913-018-2945-9.
  11. D. Kagen, C. Theobald, and M. Freeman, “Risk PredictionModels for Hospital Readmission A Systematic Review,” vol. 306, no. 15, 2014.
  12. J. Futoma, J. Morris, and J. Lucas, “A comparison of models for predicting early hospital readmissions,” J. Biomed. Inform., vol. 56, pp. 229–238, 2015, doi: 10.1016/j.jbi.2015.05.016.
  13. A. Rajkomar et al., “Scalable and accurate deep learning with electronic health records,” npj Digit. Med., no. March, pp. 1–10, 2018, doi: 10.1038/ s41746-018-0029-1.
  14. A. Mathioudakis, I. Rousalova, A. A. Gagnat, N. Saad, and G. Hardavella, “How to keep good clinical records,” Breathe, vol. 12, no. 4, pp. 371–375, 2016, doi: 10.1183/20734735.018016.
  15. K. Lybarger et al., “Leveraging natural language processing to augment structured social determinants of health data in the electronic health record,” J. Am. Med. Inform. Assoc., vol. 30, no. 8, pp. 1389–1397, 2023, doi: 10.1093/jamia/ocad073.
  16. G. T. Gobbel, R. U. Shah, C. Goodrich, and I. Ricket, “to Identify Social Determinants of Health,” pp. 1–26, 2022, doi: 10.1016/j.jbi.2021.103851.Adaptation.
  17. D. Zhang, C. Yin, J. Zeng, X. Yuan, and P. Zhang, “Combining structured and unstructured data for predictive models: a deep learning approach,” BMC Med. Inform. Decis. Mak., vol. 20, no. 1, pp. 1–10, 2020, doi: 10.1186/s12911-020-01297-6.
  18. P. Kardas, P. Lewek, and M. Matyjaszczyk, “Determinants of patient adherence: A review of systematic reviews,” Front. Pharmacol., vol. 4 JUL, no. July, pp. 1–16, 2013, doi: 10.3389/fphar.2013. 00091.
  19. S. Yoon et al., “Factors influencing medication adherence in multi-ethnic Asian patients with chronic diseases in Singapore: A qualitative study,” Front. Pharmacol., vol. 14, no. March, pp. 1–11, 2023, doi: 10.3389/fphar.2023.1124297.
  20. X. Chen, H. Xie, G. Cheng, L. K. M. Poon, M. Leng, and F. L. Wang, “Trends and features of the applications of natural language processing techniques for clinical trials text analysis,” Appl. Sci., vol. 10, no. 6, pp. 1–36, 2020, doi: 10.3390/app 10062157.
  21. J. Jia, W. Liang, and Y. Liang, “A Review of Hybrid and Ensemble in Deep Learning for Natural Language Processing,” 2023, [Online]. Available: http://arxiv.org/abs/2312.05589
  22. S. Wu et al., “Deep learning in clinical natural language processing: A methodical review,” J. Am. Med. Informatics Assoc., vol. 27, no. 3, pp. 457–470, 2020, doi: 10.1093/jamia/ocz200.
  23. K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission,” pp. 1–19, 2019, [Online]. Available: http://arxiv.org/abs/1904.05342
  24. E. Alsentzer et al., “Publicly Available Clinical BERT Embeddings,” 2019, [Online]. Available: http://arxiv.org/abs/1904.03323
  25. K. Huang et al., “Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation,” pp. 94–100, 2020, doi: 10.18653/v1/2020.clinicalnlp-1.11.
  26. S. Ji et al., “A Unified Review of Deep Learning for Automated Medical Coding,” ACM Comput. Surv., vol. 37, no. 4, 2024, doi: 10.1145/3664615.
  27. S. Maleki Varnosfaderani and M. Forouzanfar, “The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century,” Bioengineering, vol. 11, no. 4, pp. 1–38, 2024, doi: 10.3390/ bioengineering11040337.
  28. J. Lee, “Introduction to MIMIC-3 Database,” 2016.
  29. L. A. C. & R. G. M. Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, “Data Descriptor: MIMIC-III, a freely accessible critical care database,” Sci. Data, vol. 3:160035, pp. 1–9, 2016.
  30. J. Calleja, T. Etchegoyhen, and D. Ponce, “Automating Easy Read Text Segmentation,” 2024, [Online]. Available: http://arxiv.org/abs/2406.11464
  31. H. Xu, P. D. Stetson, and C. Friedman, “A study of abbreviations in clinical notes.,” AMIA Annu. Symp. Proc., pp. 821–825, 2007.
  32. M. Honnibal and I. Montani, “spaCy and the future of multi-lingual NLP,” 2015.
  33. K. Al Sharou, Z. Li, and L. Specia, “Towards a Better Understanding of Noise in Natural Language Processing,” Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, pp. 53–62, 2021, doi: 10.26615/ 978-954-452-072-4_007.
  34. L. Aufrant, “Is NLP Ready for Standardization?,” Find. Assoc. Comput. Linguist. EMNLP 2022, pp. 2785–2800, 2022, doi: 10.18653/v1/2022.findings-emnlp.202.
  35. R. Friedman, “Tokenization in the Theory of Knowledge,” Encyclopedia, vol. 3, no. 1, pp. 380–386, 2023, doi: 10.3390/encyclopedia3010024.
  36. D. Roussinov, A. Conkie, A. Patterson, and C. Sainsbury, “Predicting Clinical Events Based on Raw Text: From Bag-of-Words to Attention-Based Transformers,” Front. Digit. Heal., vol. 3, no. February, pp. 1–11, 2022, doi: 10.3389/fdgth.2021. 810260.
  37. A. Ehrmanntraut, T. Hagen, L. Konle, and F. Jannidis, “Type- And token-based word embeddings in the digital humanities,” CEUR Workshop Proc., vol. 2989, pp. 16–38, 2021.
  38. [38]            A. Hasan et al., “Infusing clinical knowledge into tokenisers for language models,” vol. 7, pp. 1–18, 2024, [Online]. Available: http://arxiv. org/abs/2406.14312
  39. R. J. Huang, N. S.-E. Kwon, Y. Tomizawa, A. Y. Choi, T. Hernandez-Boussard, and J. H. Hwang, “A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams,” JCO Clin. Cancer Informatics, no. 6, pp. 7–10, 2022, doi: 10.1200/cci.22.00039.
  40. N. Nur and Ö. Durmuş, “A comparison of traditional and state-of-the-art machine learning algorithms for type 2 diabetes prediction,” 2024.
  41. A. L. Lynam et al., “Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults,” Diagnostic Progn. Res., vol. 4, no. 1, pp. 0–9, 2020, doi: 10.1186/s41512-020-00075-2.
  42. D. Jurafsky and J. Martin, “Logistic regression Logistic regression Logistic regression,” Speech Lang. Process., vol. 404, no. 4, pp. 731–735, 2012.
  43. J. K. Jaiswal and R. Samikannu, “Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression,” in 2017 World Congress on Computing and Communication Technologies (WCCCT), IEEE, Feb. 2017, pp. 65–68. doi: 10.1109/WCCCT.2016.25.
  44. G. W. Cha, H. J. Moon, and Y. C. Kim, “Comparison of random forest and gradient boosting machine models for predicting demolition waste based on small datasets and categorical variables,” Int. J. Environ. Res. Public Health, vol. 18, no. 16, 2021, doi: 10.3390/ijerph18168530.
  45. “Evaluation : From Precision , Recall and F-Measure To Roc , Informedness , Markedness & Correlation − R,” vol. 2, no. 1, pp. 37–63, 2011.
  46. Y. Huang, A. Talwar, Y. Lin, and R. R. Aparasu, “Machine learning methods to predict 30-day hospital readmission outcome among US adults with pneumonia: analysis of the national readmission database,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–14, 2022, doi: 10.1186/s12911-022-01995-3.
  47. Hosmer, Applied Logistic Regression.3rd edn John New York: Wiley; 2013.
  48. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
  49. S. Obuobi, R. F. M. Chua, S. A. Besser, and C. E. Tabit, “Social determinants of health and hospital readmissions: can the HOSPITAL risk score be improved by the inclusion of social factors?,” BMC Health Serv. Res., vol. 21, no. 1, pp. 1–8, 2021, doi: 10.1186/s12913-020-05989-7.
  50. Joint Commission International, “National Patient Safety Goals Effective January 2022 for Office-Based Surgery Program,” no. January, pp. 1–8, 2021.
  51. M. D. Naylor, D. A. Brooten, R. L. Campbell, G. Maislin, K. M. McCauley, and J. S. Schwartz, “Transitional Care of Older Adults Hospitalized with Heart Failure: A Randomized, Controlled Trial,” J. Am. Geriatr. Soc., vol. 52, no. 5, pp. 675–684, 2004, doi: 10.1111/j.1532-5415.2004.52202.x.
  52. S. Kripalani, C. N. Theobald, B. Anctil, and E. E. Vasilevskis, “Reducing hospital readmission rates: Current strategies and future directions,” Annu. Rev. Med., vol. 65, pp. 471–485, 2014, doi: 10.1146/ annurev-med-022613-090415.
  53. Y. Huang, A. Talwar, S. Chatterjee, and R. R. Aparasu, “Pns143 Application of Machine Learning in Predicting Hospital Readmission: a Systematic Review of Literature,” Value Heal., vol. 23, p. S310, 2020, doi: 10.1016/j.jval.2020.04.1144.
  54. J. Adhiya, B. Barghi, and N. Azadeh-Fard, “Predicting the risk of hospital readmissions using a machine learning approach: a case study on patients undergoing skin procedures,” Front. Artif. Intell., vol. 6, 2023, doi: 10.3389/frai.2023.1213378.
  55. A. Salam and N. Abhinesh, “Revolutionizing dermatology: The role of artificial intelligence in clinical practice,” IP Indian J. Clin. Exp. Dermatology, vol. 10, no. 2, pp. 107–112, 2024, doi: 10.18231/j.ijced.2024.021.
  56. D. Bhati, M. S. Deogade, and D. Kanyal, “Improving Patient Outcomes Through Effective Hospital Administration: A Comprehensive Review,” Cureus, vol. 15, no. 10, 2023, doi: 10.7759/cureus.47731.
  57. B. Lahijanian and M. Alvarado, “Care strategies for reducing hospital readmissions using stochastic programming,” Healthc., vol. 9, no. 8, 2021, doi: 10.3390/healthcare9080940.

Hospital readmissions introduce a significant challenge in healthcare, leading to increased costs, reduced patient outcomes, and strained healthcare systems. Accurately predicting the risk of hospital readmission is crucial for implementing targeted interventions and improving patient care. This study investigates the use of natural language processing (NLP) techniques, specifically the ClinicalBERT model, to predict the risk of hospital readmission using the first 3-5 days of clinical notes, excluding discharge notes. We compare the performance of ClinicalBERT to other machine learning models, including logistic regression, random forest, and XGBoost, to identify the most effective approach for this task. This study highlights the potential of leveraging deep learning-based NLP models in the clinical domain to improve patient care and reduce the burden of hospital readmissions, even when utilizing only the initial clinical notes from a patient's hospitalization. It can also provide information early to allow Clinicians to intervene in patients who are at high risk. The results demonstrate that the ClinicalBERT model outperforms the other techniques, achieving higher accuracy, F1-score, and area under the receiver operating characteristic (ROC) curve. This study highlights the potential of leveraging deep learning- based NLP models in the clinical domain to improve patient care and reduce the burden of hospital readmissions.

Keywords : Hospital Readmission, Clinical Notes, ClinicalBERT, Deep learning,

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe