Adversarial Threat Simulation in Large Language Models: Red Teaming Beyond Prompt Injection


Authors : Ashwin Sharma; Anshul Goel

Volume/Issue : Volume 10 - 2025, Issue 11 - November


Google Scholar : https://tinyurl.com/3kp5s76s

Scribd : https://tinyurl.com/2em45vfk

DOI : https://doi.org/10.38124/ijisrt/25nov556

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 30 to 40 days to display the article.


Abstract : Big Language Models (LLMs) are becoming more and more exploited in sensitive areas, thus raising the issue of their security. The current red-teaming approaches, especially those that emphasize on timely injection, do not have much to say about weaknesses associated with these advanced approaches. The proposed research suggests the next-generation methods of adversarial threat-simulations in the context of LLM cybersecurity, which goes beyond the standard focus on prompt injection. An extensive theoretical framework is presented on how to classify adversarial threats which covers the whole lifecycle of the LLM, both during the training and the deployment. In the manuscript, the innovative red-teaming approaches, such as scenario-based simulations, automated adversarial generation, and ecosystem-wide red teaming are also described to give a more comprehensive review of LLM security. The most important conclusions are that the existing red-team activities are not sufficient to tackle the system vulnerabilities, which leaves LLMs vulnerable to both stage-by- stage and multi-stage attacks. The study has helped to advance a more serious method of obtaining LLMs, as well as provided information on extensive red-teaming solutions with an expanded attack surface and threat list. The results highlight the importance of ongoing and dynamic security evaluations and develop a basis on which future research can be conducted to make LLM more resilient to new adversarial threats.

Keywords : Machine Learning Security, Red Teaming, Prompt Injection, Threat Modeling, Cybersecurity, Adversarial Simulation, Large Language Models, Generative AI.

References :

  1. Abdali, S., Anarfi, R., Barberan, C., He, J., & Shayegani, E. (2024). Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices. https://doi.org/10.48550/ARXIV.2403.12503
  2. Abuadbba, A., Hicks, C., Moore, K., Mavroudis, V., Hasircioglu, B., Goel, D., & Jennings, P. (2025). From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs. https://doi.org/10.48550/ARXIV.2506.13434
  3. Al-Azzawi, M., Doan, D., Sipola, T., Hautamäki, J., & Kokkonen, T. (2025). Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review. https://doi.org/10.48550/ARXIV.2503.19626
  4. Amich, A. (2024). Multifaceted Characterization and Enhancement of Machine Learning Security. Deep Blue (University of Michigan). https://doi.org/10.7302/24910
  5. Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://doi.org/10.48550/ARXIV.2501.07238
  6. Das, B. C., Amini, M. H., & Wu, Y. (2025). Security and Privacy Challenges of Large Language Models: A Survey [Review of Security and Privacy Challenges of Large Language Models: A Survey]. ACM Computing Surveys. Association for Computing Machinery. https://doi.org/10.1145/3712001
  7. Derczynski, L., Galinkin, E., Martin, J., Majumdar, S., & Inie, N. (2024). garak: A Framework for Security Probing Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.11036
  8. Derner, E., Batistič, K., Zahálka, J., & Babuška, R. (2023). A Security Risk Taxonomy for Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2311.11415
  9. Dong, Y., Mu, R., Zhang, Y., Sun, S., Zhang, T., Wu, C., Jin, G., Qi, Y., Hu, J., Meng, J., Bensalem, S., & Huang, X. (2024). Safeguarding Large Language Models: A Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.02622
  10. Feffer, M., Sinha, A., Deng, W. H., Lipton, Z. C., & Heidari, H. (2024). Red-Teaming for Generative AI: Silver Bullet or Security Theater? https://doi.org/10.48550/ARXIV.2401.15897
  11. Fu, T., Sharma, M., Torr, P., Cohen, S. B., Krueger, D., & Barez, F. (2024). PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning. https://doi.org/10.48550/ARXIV.2410.08811
  12. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. 79. https://doi.org/10.1145/3605764.3623985
  13. Hassanin, M., & Moustafa, N. (2024). A Comprehensive Overview of Large Language Models (LLMs) for Cyber   Defences: Opportunities and Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.14487
  14. Hill, B., Parla, S., Balabhadruni, V. A., Padmalayam, A. P., & Sharma, S. C. S. (2025). Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs. https://doi.org/10.48550/ARXIV.2509.04615
  15. Huang, X., Ruan, W., Huang, W., Jin, G., Dong, Y., Wu, C., Bensalem, S., Mu, R., Yi, Q., Zhao, X., Cai, K., Zhang, Y., Wu, S., Xu, P., Wu, D., Freitas, A., & Mustafa, M. (2024). A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artificial Intelligence Review, 57(7). https://doi.org/10.1007/s10462-024-10824-0
  16. Khomsky, D., Maloyan, N., & Nutfullin, B. (2024). Prompt Injection Attacks in Defended Systems. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.14048
  17. Li, H., Chen, Y., Luo, J., Kang, Y., Zhang, X., Hu, Q., Chan, C., & Song, Y. (2023). Privacy in Large Language Models: Attacks, Defenses and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.10383
  18. Li, M. Q., & Fung, B. C. M. (2025). Security concerns for Large Language Models: A survey. Journal of Information Security and Applications, 95, 104284. https://doi.org/10.1016/j.jisa.2025.104284
  19. Liu, F. W., & Hu, C. (2024). Exploring Vulnerabilities and Protections in Large Language Models: A   Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.00240
  20. Liu, S., Sheng, Q., Wang, D., Li, Y., Yang, G., & Cao, J. (2025). Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks. https://doi.org/10.48550/ARXIV.2508.20038
  21. Majumdar, S., Pendleton, B., & Gupta, A. (2025). Red Teaming AI Red Teaming. https://doi.org/10.48550/ARXIV.2507.05538
  22. Miranda, M., Ruzzetti, E. S., Santilli, A., Zanzotto, F. M., Bratières, S., & Rodolà, E. (2024). Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.05212
  23. Pathade, C. (2025). Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs. https://doi.org/10.48550/ARXIV.2505.04806
  24. Purpura, A., Wadhwa, S., Zymet, J., Gupta, A., Luo, A. A., Rad, M. K., Shinde, S., & Sorower, M. S. (2025). Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models. 335. https://doi.org/10.18653/v1/2025.trustnlp-main.23
  25. Qiang, Y., Zhou, X., Zade, S. Z., Roshani, M. A., Zytko, D., & Zhu, D. (2024). Learning to Poison Large Language Models During Instruction Tuning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2402.13459
  26. Radanliev, P., & Santos, O. (2023). Adversarial Attacks Can Deceive AI Systems, Leading to Misclassification or Incorrect Decisions. https://doi.org/10.20944/preprints202309.2064.v1
  27. Rawat, A., Schoepf, S., Zizzo, G., Cornacchia, G., Hameed, M. Z., Fraser, K., Miehling, E., Buesser, B., Daly, E. M., Purcell, M., Sattigeri, P., Chen, P.-Y., & Varshney, K. R. (2024). Attack Atlas: A Practitioner’s Perspective on Challenges and Pitfalls in Red Teaming GenAI. https://doi.org/10.48550/ARXIV.2409.15398
  28. Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., & Abu‐Ghazaleh, N. (2023). Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.10844
  29. Swanda, A., Chang, A., Chen, A., Burch, F., Kassianik, P., & Berlin, K. (2025). A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2509.20639
  30. Tete, S. B. (2024a). Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications. https://doi.org/10.48550/ARXIV.2406.11007
  31. Tete, S. B. (2024b). Threat Modelling and Risk Analysis for Large Language Model   (LLM)-Powered Applications. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.11007
  32. Verma, A., Krishna, S., Gehrmann, S., Seshadri, M., Pradhan, A., Ault, T., Barrett, L., Rabinowitz, D., Doucette, J., & Phan, N. (2024a). Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). https://doi.org/10.48550/ARXIV.2407.14937
  33. Verma, A., Krishna, S., Gehrmann, S., Seshadri, M., Pradhan, A., Ault, T., Barrett, L., Rabinowitz, D., Doucette, J., & Phan, N. (2024b). Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). arXiv (Cornell University). https://doi.org/10.48550/arxiv.2407.14937
  34. Vitorino, J., Maia, E., & Praça, I. (2024). Adversarial Evasion Attack Efficiency against Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.08050
  35. Wan, A., Wallace, E., Shen, S., & Klein, D. (2023). Poisoning Language Models During Instruction Tuning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.00944
  36. Wang, N., Walter, K., Gao, Y., & Abuadbba, A. (2025). Large Language Model Adversarial Landscape Through the Lens of Attack Objectives. https://doi.org/10.48550/ARXIV.2502.02960
  37. Wang, S., Zhu, T., Liu, B., Ding, M., Guo, X., Ye, D., & Zhou, W. (2024). Unique Security and Privacy Threats of Large Language Model: A   Comprehensive Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.07973
  38. Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., & Zhang, Y. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. https://doi.org/10.1016/j.hcc.2024.100211
  39. Yip, D. W., Esmradi, A., & Chan, C. F. (2023). A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 71, 1. https://doi.org/10.1109/csde59766.2023.10487667
  40. Zangana, H. M., Mustafa, F. M., & Li, S. (2024). Large Language Models in Cybersecurity. In Advances in information security, privacy, and ethics book series (p. 277). IGI Global. https://doi.org/10.4018/979-8-3373-1102-9.ch009
  41. Zhang, X., Lyu, D., & Li, X. (2025). Risk Assessment and Security Analysis of Large Language Models. https://doi.org/10.48550/ARXIV.2508.17329

Big Language Models (LLMs) are becoming more and more exploited in sensitive areas, thus raising the issue of their security. The current red-teaming approaches, especially those that emphasize on timely injection, do not have much to say about weaknesses associated with these advanced approaches. The proposed research suggests the next-generation methods of adversarial threat-simulations in the context of LLM cybersecurity, which goes beyond the standard focus on prompt injection. An extensive theoretical framework is presented on how to classify adversarial threats which covers the whole lifecycle of the LLM, both during the training and the deployment. In the manuscript, the innovative red-teaming approaches, such as scenario-based simulations, automated adversarial generation, and ecosystem-wide red teaming are also described to give a more comprehensive review of LLM security. The most important conclusions are that the existing red-team activities are not sufficient to tackle the system vulnerabilities, which leaves LLMs vulnerable to both stage-by- stage and multi-stage attacks. The study has helped to advance a more serious method of obtaining LLMs, as well as provided information on extensive red-teaming solutions with an expanded attack surface and threat list. The results highlight the importance of ongoing and dynamic security evaluations and develop a basis on which future research can be conducted to make LLM more resilient to new adversarial threats.

Keywords : Machine Learning Security, Red Teaming, Prompt Injection, Threat Modeling, Cybersecurity, Adversarial Simulation, Large Language Models, Generative AI.

CALL FOR PAPERS


Paper Submission Last Date
30 - November - 2025

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe