Enhancing Clinical Documentation with Synthetic Data: Leveraging Generative Models for Improved Accuracy


Authors : Anjanava Biswas; Wrick Talukdar

Volume/Issue : Volume 9 - 2024, Issue 5 - May

Google Scholar : https://tinyurl.com/2zds9dbs

Scribd : https://tinyurl.com/jd539u8n

DOI : https://doi.org/10.38124/ijisrt/IJISRT24MAY2085

Abstract : Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper proposes a novel approach to augment clinical documentation by leveraging synthetic data generation techniques to generate realistic and diverse clinical transcripts. We present a methodology that combines state-of- the-art generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), with real-world clinical transcript and other forms of clinical data to generate synthetic transcripts. These synthetic transcripts can then be used to supplement existing documentation workflows, providing additional training data for natural language processing models and enabling more accurate and efficient transcription processes. Through extensive experiments on a large dataset of anonymized clinical transcripts, we demonstrate the effectiveness of our approach in generating high- quality synthetic transcripts that closely resemble real- world data. Quantitative evaluation metrics, including perplexity scores and BLEU scores, as well as qualitative assessments by domain experts, validate the fidelity and utility of the generated synthetic transcripts. Our findings highlight synthetic data generation's potential to address clinical documentation challenges, improving patient care, reducing administrative burdens, and enhancing healthcare system efficiency.

References :

  1. Rosenbloom, S. & Stead, William & Denny, Joshua & Giuse, Dario & Lorenzi, Nancy & Brown, Steven & Johnson, Kevin. (2010). Generating Clinical Notes for Electronic Health Record Systems. Applied clinical informatics. 1. 232-243. 10.4338/ACI-2010-03-RA-0019.
  2. Ammenwerth E, Spötl HP. The time needed for clinical documentation versus direct patient care. A work-sampling analysis of physicians' activities. Methods Inf Med. 2009;48(1):84-91. PMID: 19151888.
  3. Joukes E, Abu-Hanna A, Cornet R, de Keizer NF. Time Spent on Dedicated Patient Care and Documentation Tasks Before and After the Introduction of a Structured and Standardized Electronic Health Record. Appl Clin Inform. 2018 Jan;9(1):46-53. doi: 10.1055/s-0037-1615747. Epub 2018 Jan 17. PMID: 29342479; PMCID: PMC5801881.
  4. Gaffney A, Woolhandler S, Cai C, Bor D, Himmelstein J, McCormick D, Himmelstein DU. Medical Documentation Burden Among US Office-Based Physicians in 2019: A National Study. JAMA Intern Med. 2022 May 1;182(5):564-566. doi: 10.1001/jamainternmed.2022.0372. PMID: 35344006; PMCID: PMC8961402.
  5. [Web] Remy Franklin; Is physician time being used well? May 30, 2023 https://mobius.md/2023/05/30/is-physician-time-being-used-well/
  6. Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci. 2024 Mar 15;19(1):27. doi: 10.1186/s13012-024-01357-9. PMID: 38491544; PMCID: PMC10941464.
  7. [Web] Fraser Health to leverage generative AI for clinical documentation in MEDITECH Expanse HER https://ehr.meditech.com/news/fraser-health-to-leverage-generative-ai-for-clinical-documentation-in-meditech-expanse-ehr
  8. Creswell, Antonia & White, Tom & Dumoulin, Vincent & Arulkumaran, Kai & Sengupta, Biswa & Bharath, Anil. (2017). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine. 35. 10.1109/MSP.2017.2765202.
  9. Cukier, R.. (2022). Three Variations on Variational Autoencoders. arXiv:2212.04451
  10. Nguyen, Thai Binh & Nguyen, Quang & Nguyen, Thu-Hien & Pham, Phuong & Nguyen, The Loc & Do, Quoc. (2019). VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination. arXiv:1910.05608
  11. Serrano, S., Brumbaugh, Z., & Smith, N.A. (2023). Language Models: A Guide for the Perplexed. arXiv, abs/2311.17301.
  12. Blagec, Kathrin & Dorffner, Georg & Moradi, Milad & Ott, Simon & Samwald, Matthias. (2022). A global analysis of metrics used for measuring performance in natural language processing.  arXiv:2204.11574
  13. van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5. PMID: 33772070; PMCID: PMC7997964.
  14. Rosenbloom, S. & Stead, William & Denny, Joshua & Giuse, Dario & Lorenzi, Nancy & Brown, Steven & Johnson, Kevin. (2010). Generating Clinical Notes for Electronic Health Record Systems. Applied clinical informatics. 1. 232-243. 10.4338/ACI-2010-03-RA-0019.
  15. Ammenwerth E, Spötl HP. The time needed for clinical documentation versus direct patient care. A work-sampling analysis of physicians' activities. Methods Inf Med. 2009;48(1):84-91. PMID: 19151888.
  16. Joukes E, Abu-Hanna A, Cornet R, de Keizer NF. Time Spent on Dedicated Patient Care and Documentation Tasks Before and After the Introduction of a Structured and Standardized Electronic Health Record. Appl Clin Inform. 2018 Jan;9(1):46-53. doi: 10.1055/s-0037-1615747. Epub 2018 Jan 17. PMID: 29342479; PMCID: PMC5801881.
  17. Gaffney A, Woolhandler S, Cai C, Bor D, Himmelstein J, McCormick D, Himmelstein DU. Medical Documentation Burden Among US Office-Based Physicians in 2019: A National Study. JAMA Intern Med. 2022 May 1;182(5):564-566. doi: 10.1001/jamainternmed.2022.0372. PMID: 35344006; PMCID: PMC8961402.
  18. [Web] Remy Franklin; Is physician time being used well? May 30, 2023 https://mobius.md/2023/05/30/is-physician-time-being-used-well/
  19. Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci. 2024 Mar 15;19(1):27. doi: 10.1186/s13012-024-01357-9. PMID: 38491544; PMCID: PMC10941464.
  20. [Web] Fraser Health to leverage generative AI for clinical documentation in MEDITECH Expanse HER https://ehr.meditech.com/news/fraser-health-to-leverage-generative-ai-for-clinical-documentation-in-meditech-expanse-ehr
  21. Creswell, Antonia & White, Tom & Dumoulin, Vincent & Arulkumaran, Kai & Sengupta, Biswa & Bharath, Anil. (2017). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine. 35. 10.1109/MSP.2017.2765202.
  22. Cukier, R.. (2022). Three Variations on Variational Autoencoders. arXiv:2212.04451
  23. Nguyen, Thai Binh & Nguyen, Quang & Nguyen, Thu-Hien & Pham, Phuong & Nguyen, The Loc & Do, Quoc. (2019). VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination. arXiv:1910.05608
  24. Serrano, S., Brumbaugh, Z., & Smith, N.A. (2023). Language Models: A Guide for the Perplexed. arXiv, abs/2311.17301.
  25. Blagec, Kathrin & Dorffner, Georg & Moradi, Milad & Ott, Simon & Samwald, Matthias. (2022). A global analysis of metrics used for measuring performance in natural language processing.  arXiv:2204.11574
  26. van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5. PMID: 33772070; PMCID: PMC7997964.
  27. Tran BD, Mangu R, Tai-Seale M, Lafata JE, Zheng K. Automatic speech recognition performance for digital scribes: a performance comparison between general-purpose and specialized models tuned for patient-clinician conversations. AMIA Annu Symp Proc. 2023 Apr 29;2022:1072-1080. PMID: 37128439; PMCID: PMC10148344.
  28. Rezaii N, Wolff P, Price BH. Natural language processing in psychiatry: the promises and perils of a transformative approach. Br J Psychiatry. 2022 Jan 7:1-3. doi: 10.1192/bjp.2021.188. Epub ahead of print. PMID: 35048814.
  29. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017 Jun 21;2(4):230-243. doi: 10.1136/svn-2017-000101. PMID: 29507784; PMCID: PMC5829945.
  30. Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200. PMID: 31794016; PMCID: PMC7025365.
  31. Burgos N, Bottani S, Faouzi J, Thibeau-Sutre E, Colliot O. Deep learning for brain disorders: from data processing to disease treatment. Brief Bioinform. 2021 Mar 22;22(2):1560-1576. doi: 10.1093/bib/bbaa310. PMID: 33316030.
  32. Locke WN, Booth DA. Translation. Machine Translation of Languages. Cambridge, MA: MIT Press; 1955. p. 15-23.
  33. Chomsky, N. (1965). Persistent Topics in Linguistic Theory. Diogenes, 13(51), 13-20. https://doi.org/10.1177/039219216501305102
  34. Charniak, E. (1983). Passing markers: A theory of contextual influence in language comprehension. Cognitive Science, 7(3), 171–190. https://doi.org/10.1207/s15516709cog0703_1
  35. Wermter S, Riloff E, Scheler G. (Eds.). Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. Berlin: Springer Science & Business Media; 1996.
  36. Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035. doi: 10.1038/sdata.2016.35.
  37. Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. Journal of the American Medical Informatics Association. 2010;17(5):514-518. doi: 10.1136/jamia.2010.003947.
  38. Goodfellow, Ian & Pouget-Abadie, Jean & Mirza, Mehdi & Xu, Bing & Warde-Farley, David & Ozair, Sherjil & Courville, Aaron & Bengio, Y.. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems. 3. 10.1145/3422622.
  39. Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
  40. Holtzman A, Buys J, Du L, Forbes M, Choi Y. The Curious Case of Neural Text Degeneration. International Conference on Learning Representations (ICLR). 2020. Available from: arXiv:1904.09751.
  41. Brown, Peter F., Stephen Della Pietra, Vincent J. Della Pietra and Robert L. Mercer. “The Mathematics of Statistical Machine Translation: Parameter Estimation.” Computational. Linguistics 19 (1993): 263-311.
  42. Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 2002. doi: 10.3115/1073083.1073135.
  43. Yu, L., Zhang, W., Wang, J., & Yu, Y. (2017). SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10804
  44. Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
  45. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 10–21, Berlin, Germany. Association for Computational Linguistics.

Accurate and comprehensive clinical documentation is crucial for delivering high-quality healthcare, facilitating effective communication among providers, and ensuring compliance with regulatory requirements. However, manual transcription and data entry processes can be time-consuming, error-prone, and susceptible to inconsistencies, leading to incomplete or inaccurate medical records. This paper proposes a novel approach to augment clinical documentation by leveraging synthetic data generation techniques to generate realistic and diverse clinical transcripts. We present a methodology that combines state-of- the-art generative models, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), with real-world clinical transcript and other forms of clinical data to generate synthetic transcripts. These synthetic transcripts can then be used to supplement existing documentation workflows, providing additional training data for natural language processing models and enabling more accurate and efficient transcription processes. Through extensive experiments on a large dataset of anonymized clinical transcripts, we demonstrate the effectiveness of our approach in generating high- quality synthetic transcripts that closely resemble real- world data. Quantitative evaluation metrics, including perplexity scores and BLEU scores, as well as qualitative assessments by domain experts, validate the fidelity and utility of the generated synthetic transcripts. Our findings highlight synthetic data generation's potential to address clinical documentation challenges, improving patient care, reducing administrative burdens, and enhancing healthcare system efficiency.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe