Authors :
Anjanava Biswas; Wrick Talukdar
Volume/Issue :
Volume 9 - 2024, Issue 5 - May
Google Scholar :
https://tinyurl.com/2zds9dbs
Scribd :
https://tinyurl.com/jd539u8n
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24MAY2085
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Accurate and comprehensive clinical
documentation is crucial for delivering high-quality
healthcare, facilitating effective communication among
providers, and ensuring compliance with regulatory
requirements. However, manual transcription and data
entry processes can be time-consuming, error-prone, and
susceptible to inconsistencies, leading to incomplete or
inaccurate medical records. This paper proposes a novel
approach to augment clinical documentation by leveraging
synthetic data generation techniques to generate realistic
and diverse clinical transcripts.
We present a methodology that combines state-of-
the-art generative models, such as Generative Adversarial
Networks (GANs) and Variational Autoencoders (VAEs),
with real-world clinical transcript and other forms of
clinical data to generate synthetic transcripts. These
synthetic transcripts can then be used to supplement
existing documentation workflows, providing additional
training data for natural language processing models and
enabling more accurate and efficient transcription
processes. Through extensive experiments on a large
dataset of anonymized clinical transcripts, we demonstrate
the effectiveness of our approach in generating high-
quality synthetic transcripts that closely resemble real-
world data. Quantitative evaluation metrics, including
perplexity scores and BLEU scores, as well as qualitative
assessments by domain experts, validate the fidelity and
utility of the generated synthetic transcripts. Our findings
highlight synthetic data generation's potential to address
clinical documentation challenges, improving patient care,
reducing administrative burdens, and enhancing
healthcare system efficiency.
References :
- Rosenbloom, S. & Stead, William & Denny, Joshua & Giuse, Dario & Lorenzi, Nancy & Brown, Steven & Johnson, Kevin. (2010). Generating Clinical Notes for Electronic Health Record Systems. Applied clinical informatics. 1. 232-243. 10.4338/ACI-2010-03-RA-0019.
- Ammenwerth E, Spötl HP. The time needed for clinical documentation versus direct patient care. A work-sampling analysis of physicians' activities. Methods Inf Med. 2009;48(1):84-91. PMID: 19151888.
- Joukes E, Abu-Hanna A, Cornet R, de Keizer NF. Time Spent on Dedicated Patient Care and Documentation Tasks Before and After the Introduction of a Structured and Standardized Electronic Health Record. Appl Clin Inform. 2018 Jan;9(1):46-53. doi: 10.1055/s-0037-1615747. Epub 2018 Jan 17. PMID: 29342479; PMCID: PMC5801881.
- Gaffney A, Woolhandler S, Cai C, Bor D, Himmelstein J, McCormick D, Himmelstein DU. Medical Documentation Burden Among US Office-Based Physicians in 2019: A National Study. JAMA Intern Med. 2022 May 1;182(5):564-566. doi: 10.1001/jamainternmed.2022.0372. PMID: 35344006; PMCID: PMC8961402.
- [Web] Remy Franklin; Is physician time being used well? May 30, 2023 https://mobius.md/2023/05/30/is-physician-time-being-used-well/
- Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci. 2024 Mar 15;19(1):27. doi: 10.1186/s13012-024-01357-9. PMID: 38491544; PMCID: PMC10941464.
- [Web] Fraser Health to leverage generative AI for clinical documentation in MEDITECH Expanse HER https://ehr.meditech.com/news/fraser-health-to-leverage-generative-ai-for-clinical-documentation-in-meditech-expanse-ehr
- Creswell, Antonia & White, Tom & Dumoulin, Vincent & Arulkumaran, Kai & Sengupta, Biswa & Bharath, Anil. (2017). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine. 35. 10.1109/MSP.2017.2765202.
- Cukier, R.. (2022). Three Variations on Variational Autoencoders. arXiv:2212.04451
- Nguyen, Thai Binh & Nguyen, Quang & Nguyen, Thu-Hien & Pham, Phuong & Nguyen, The Loc & Do, Quoc. (2019). VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination. arXiv:1910.05608
- Serrano, S., Brumbaugh, Z., & Smith, N.A. (2023). Language Models: A Guide for the Perplexed. arXiv, abs/2311.17301.
- Blagec, Kathrin & Dorffner, Georg & Moradi, Milad & Ott, Simon & Samwald, Matthias. (2022). A global analysis of metrics used for measuring performance in natural language processing. arXiv:2204.11574
- van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5. PMID: 33772070; PMCID: PMC7997964.
- Rosenbloom, S. & Stead, William & Denny, Joshua & Giuse, Dario & Lorenzi, Nancy & Brown, Steven & Johnson, Kevin. (2010). Generating Clinical Notes for Electronic Health Record Systems. Applied clinical informatics. 1. 232-243. 10.4338/ACI-2010-03-RA-0019.
- Ammenwerth E, Spötl HP. The time needed for clinical documentation versus direct patient care. A work-sampling analysis of physicians' activities. Methods Inf Med. 2009;48(1):84-91. PMID: 19151888.
- Joukes E, Abu-Hanna A, Cornet R, de Keizer NF. Time Spent on Dedicated Patient Care and Documentation Tasks Before and After the Introduction of a Structured and Standardized Electronic Health Record. Appl Clin Inform. 2018 Jan;9(1):46-53. doi: 10.1055/s-0037-1615747. Epub 2018 Jan 17. PMID: 29342479; PMCID: PMC5801881.
- Gaffney A, Woolhandler S, Cai C, Bor D, Himmelstein J, McCormick D, Himmelstein DU. Medical Documentation Burden Among US Office-Based Physicians in 2019: A National Study. JAMA Intern Med. 2022 May 1;182(5):564-566. doi: 10.1001/jamainternmed.2022.0372. PMID: 35344006; PMCID: PMC8961402.
- [Web] Remy Franklin; Is physician time being used well? May 30, 2023 https://mobius.md/2023/05/30/is-physician-time-being-used-well/
- Reddy S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implement Sci. 2024 Mar 15;19(1):27. doi: 10.1186/s13012-024-01357-9. PMID: 38491544; PMCID: PMC10941464.
- [Web] Fraser Health to leverage generative AI for clinical documentation in MEDITECH Expanse HER https://ehr.meditech.com/news/fraser-health-to-leverage-generative-ai-for-clinical-documentation-in-meditech-expanse-ehr
- Creswell, Antonia & White, Tom & Dumoulin, Vincent & Arulkumaran, Kai & Sengupta, Biswa & Bharath, Anil. (2017). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine. 35. 10.1109/MSP.2017.2765202.
- Cukier, R.. (2022). Three Variations on Variational Autoencoders. arXiv:2212.04451
- Nguyen, Thai Binh & Nguyen, Quang & Nguyen, Thu-Hien & Pham, Phuong & Nguyen, The Loc & Do, Quoc. (2019). VAIS Hate Speech Detection System: A Deep Learning based Approach for System Combination. arXiv:1910.05608
- Serrano, S., Brumbaugh, Z., & Smith, N.A. (2023). Language Models: A Guide for the Perplexed. arXiv, abs/2311.17301.
- Blagec, Kathrin & Dorffner, Georg & Moradi, Milad & Ott, Simon & Samwald, Matthias. (2022). A global analysis of metrics used for measuring performance in natural language processing. arXiv:2204.11574
- van Buchem MM, Boosman H, Bauer MP, Kant IMJ, Cammel SA, Steyerberg EW. The digital scribe in clinical practice: a scoping review and research agenda. NPJ Digit Med. 2021 Mar 26;4(1):57. doi: 10.1038/s41746-021-00432-5. PMID: 33772070; PMCID: PMC7997964.
- Tran BD, Mangu R, Tai-Seale M, Lafata JE, Zheng K. Automatic speech recognition performance for digital scribes: a performance comparison between general-purpose and specialized models tuned for patient-clinician conversations. AMIA Annu Symp Proc. 2023 Apr 29;2022:1072-1080. PMID: 37128439; PMCID: PMC10148344.
- Rezaii N, Wolff P, Price BH. Natural language processing in psychiatry: the promises and perils of a transformative approach. Br J Psychiatry. 2022 Jan 7:1-3. doi: 10.1192/bjp.2021.188. Epub ahead of print. PMID: 35048814.
- Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017 Jun 21;2(4):230-243. doi: 10.1136/svn-2017-000101. PMID: 29507784; PMCID: PMC5829945.
- Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200. PMID: 31794016; PMCID: PMC7025365.
- Burgos N, Bottani S, Faouzi J, Thibeau-Sutre E, Colliot O. Deep learning for brain disorders: from data processing to disease treatment. Brief Bioinform. 2021 Mar 22;22(2):1560-1576. doi: 10.1093/bib/bbaa310. PMID: 33316030.
- Locke WN, Booth DA. Translation. Machine Translation of Languages. Cambridge, MA: MIT Press; 1955. p. 15-23.
- Chomsky, N. (1965). Persistent Topics in Linguistic Theory. Diogenes, 13(51), 13-20. https://doi.org/10.1177/039219216501305102
- Charniak, E. (1983). Passing markers: A theory of contextual influence in language comprehension. Cognitive Science, 7(3), 171–190. https://doi.org/10.1207/s15516709cog0703_1
- Wermter S, Riloff E, Scheler G. (Eds.). Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. Berlin: Springer Science & Business Media; 1996.
- Johnson AEW, Pollard TJ, Shen L, Lehman L-wH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035. doi: 10.1038/sdata.2016.35.
- Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. Journal of the American Medical Informatics Association. 2010;17(5):514-518. doi: 10.1136/jamia.2010.003947.
- Goodfellow, Ian & Pouget-Abadie, Jean & Mirza, Mehdi & Xu, Bing & Warde-Farley, David & Ozair, Sherjil & Courville, Aaron & Bengio, Y.. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems. 3. 10.1145/3422622.
- Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
- Holtzman A, Buys J, Du L, Forbes M, Choi Y. The Curious Case of Neural Text Degeneration. International Conference on Learning Representations (ICLR). 2020. Available from: arXiv:1904.09751.
- Brown, Peter F., Stephen Della Pietra, Vincent J. Della Pietra and Robert L. Mercer. “The Mathematics of Statistical Machine Translation: Parameter Estimation.” Computational. Linguistics 19 (1993): 263-311.
- Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 2002. doi: 10.3115/1073083.1073135.
- Yu, L., Zhang, W., Wang, J., & Yu, Y. (2017). SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10804
- Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
- Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating Sentences from a Continuous Space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 10–21, Berlin, Germany. Association for Computational Linguistics.
Accurate and comprehensive clinical
documentation is crucial for delivering high-quality
healthcare, facilitating effective communication among
providers, and ensuring compliance with regulatory
requirements. However, manual transcription and data
entry processes can be time-consuming, error-prone, and
susceptible to inconsistencies, leading to incomplete or
inaccurate medical records. This paper proposes a novel
approach to augment clinical documentation by leveraging
synthetic data generation techniques to generate realistic
and diverse clinical transcripts.
We present a methodology that combines state-of-
the-art generative models, such as Generative Adversarial
Networks (GANs) and Variational Autoencoders (VAEs),
with real-world clinical transcript and other forms of
clinical data to generate synthetic transcripts. These
synthetic transcripts can then be used to supplement
existing documentation workflows, providing additional
training data for natural language processing models and
enabling more accurate and efficient transcription
processes. Through extensive experiments on a large
dataset of anonymized clinical transcripts, we demonstrate
the effectiveness of our approach in generating high-
quality synthetic transcripts that closely resemble real-
world data. Quantitative evaluation metrics, including
perplexity scores and BLEU scores, as well as qualitative
assessments by domain experts, validate the fidelity and
utility of the generated synthetic transcripts. Our findings
highlight synthetic data generation's potential to address
clinical documentation challenges, improving patient care,
reducing administrative burdens, and enhancing
healthcare system efficiency.