Authors :
Mohammed Umar Alhaji; Usman Ahmad Ahmad; Nasir Muazu Abba
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/yntn9dsp
Scribd :
https://tinyurl.com/rnw6nmh9
DOI :
https://doi.org/10.38124/ijisrt/26jan1026
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Reinforcement learning has emerged as a promising approach for adaptive cancer therapy due to its ability to
optimize sequential treatment decisions under uncertainty. While studies have demonstrated the potential of
reinforcement learning to improve simulated treatment outcomes, most evaluations rely primarily on average
performance metrics obtained through direct simulation rollouts. Such evaluation practices provide limited insight into
uncertainty, robustness, and worst-case behavior, which are critical considerations in safety-sensitive clinical domains.
This study proposes a standardized, risk-aware, and uncertainty-sensitive evaluation framework for reinforcement
learning based adaptive cancer therapy using simulated tumor environments. A Deep Q Network policy is evaluated
against clinically interpretable baselines using multiple performance perspectives, including mean outcomes, worst-case
metrics, and tail risk measures based on Conditional Value at Risk. Robustness is further assessed under parameter
perturbations and distribution shifts representing aggressive tumor dynamics. Experimental results demonstrate that
adaptive reinforcement learning policies achieve tumor control comparable to maximum dose therapy while maintaining
controlled risk exposure and stable performance under uncertainty. The findings emphasize that rigorous, risk-sensitive
evaluation is essential for drawing reliable conclusions about reinforcement learning based treatment strategies before any
real-world deployment.
Keywords :
Reinforcement Learning; Adaptive Cancer Therapy; Risk Aware Evaluation; Conditional Value at Risk; Simulation- Based Evaluation; Deep Q Network.
References :
- A. Eastman, J. S. Brown, and J. J. Cunningham, “Reinforcement learning-derived chemotherapeutic schedules for robust patient-specific therapy,” Nature Machine Intelligence, vol. 3, no. 12, pp. 1091-1101, Dec. 2021. https://doi.org/10.1038/s42256-021-00424-3.
- X. Fu, Y. Luo, and B. Schölkopf, “Off-policy evaluation in reinforcement learning: A survey,” ACM Computing Surveys, vol. 56, no. 1, Art. no. 9, pp. 1-39, 2024. https://doi.org/10.1145/3594560.
- O. Gottesman et al., “Guidelines for reinforcement learning in healthcare,” Nature Medicine, vol. 25, no. 1, pp. 16-18, Jan. 2019. https://doi.org/10.1038/s41591-018-0310-5.
- N. Jiang and L. Li, “Doubly robust off-policy value evaluation for reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning, PMLR, 2016, pp. 652-661.
- M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, “The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care,” Nature Medicine, vol. 24, no. 11, pp. 1716-1720, Nov. 2018. https://doi.org/10.1038/s41591-018-0213-5.
- M. Labrie, M. Tannenbaum, and R. A. Gatenby, “Adaptive therapy in oncology: Principles and perspectives,” Cancer Research, vol. 82, no. 15, pp. 2761-2769, Aug. 2022. https://doi.org/10.1158/0008-5472.CAN-21-3849.
- S. A. Murphy, “Optimal dynamic treatment regimes,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 65, no. 2, pp. 331-355, 2003. https://doi.org/10.1111/1467-9868.00389.
- S. Nemati, M. M. Ghassemi, and G. D. Clifford, “Optimal medical therapy dosing from suboptimal clinical examples: A deep reinforcement learning approach,” in Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE, 2016, pp. 2978-2981. https://doi.org/10.1109/EMBC.2016.7591355.
- S. Padmanabhan, A. Mesbah, and Y. Shen, “Reinforcement learning in healthcare: A systematic review,” Artificial Intelligence in Medicine, vol. 145, Art. no. 102657, 2024. https://doi.org/10.1016/j.artmed.2023.102657.
- D. Precup, R. S. Sutton, and S. Singh, “Eligibility traces for off-policy policy evaluation,” in Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, 2000, pp. 759-766.
- Y. Shen, Y. Wu, and Z. Wang, “Deep reinforcement learning for chemotherapy scheduling,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 6, pp. 1387-1398, Jun. 2017. https://doi.org/10.1109/TBME.2016.2608790.
- A. Singh, R. Kumar, and D. Gupta, “Sequential decision-making in oncology using reinforcement learning,” Frontiers in Oncology, vol. 14, Art. no. 1278456, 2024. https://doi.org/10.3389/fonc.2024.1278456.
- X. Sun, Y. Zhang, and J. Li, “Adaptive cancer therapy via deep reinforcement learning,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 8, pp. 4021-4032, Aug. 2023. https://doi.org/10.1109/JBHI.2023.3279410.
- R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
- P. S. Thomas and E. Brunskill, “Data-efficient off-policy policy evaluation for reinforcement learning,” in Proceedings of the 33rd International Conference on Machine Learning, PMLR, 2016, pp. 2139-2148.
- L. Xu, S. Zhu, and N. Wen, “Deep reinforcement learning and its applications in medical imaging and radiation therapy: A survey,” Physics in Medicine and Biology, vol. 67, no. 22, Art. no. 22TR02, 2022. https://doi.org/10.1088/1361-6560/ac9cb3.
- C. Yu, J. Liu, S. Nemati, and F. Doshi-Velez, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys, vol. 55, no. 1, Art. no. 5, pp. 1-36, 2019. https://doi.org/10.1145/3312042.
Reinforcement learning has emerged as a promising approach for adaptive cancer therapy due to its ability to
optimize sequential treatment decisions under uncertainty. While studies have demonstrated the potential of
reinforcement learning to improve simulated treatment outcomes, most evaluations rely primarily on average
performance metrics obtained through direct simulation rollouts. Such evaluation practices provide limited insight into
uncertainty, robustness, and worst-case behavior, which are critical considerations in safety-sensitive clinical domains.
This study proposes a standardized, risk-aware, and uncertainty-sensitive evaluation framework for reinforcement
learning based adaptive cancer therapy using simulated tumor environments. A Deep Q Network policy is evaluated
against clinically interpretable baselines using multiple performance perspectives, including mean outcomes, worst-case
metrics, and tail risk measures based on Conditional Value at Risk. Robustness is further assessed under parameter
perturbations and distribution shifts representing aggressive tumor dynamics. Experimental results demonstrate that
adaptive reinforcement learning policies achieve tumor control comparable to maximum dose therapy while maintaining
controlled risk exposure and stable performance under uncertainty. The findings emphasize that rigorous, risk-sensitive
evaluation is essential for drawing reliable conclusions about reinforcement learning based treatment strategies before any
real-world deployment.
Keywords :
Reinforcement Learning; Adaptive Cancer Therapy; Risk Aware Evaluation; Conditional Value at Risk; Simulation- Based Evaluation; Deep Q Network.