Adversarial threat simulation in large language models red teaming beyond prompt injection| International Journal of Innovative Science and Research Technology

Adversarial Threat Simulation in Large Language Models: Red Teaming Beyond Prompt Injection

Authors : Ashwin Sharma; Anshul Goel

Volume/Issue : Volume 10 - 2025, Issue 11 - November

Google Scholar : https://tinyurl.com/3kp5s76s

Scribd : https://tinyurl.com/2em45vfk

DOI : https://doi.org/10.38124/ijisrt/25nov556

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : Big Language Models (LLMs) are becoming more and more exploited in sensitive areas, thus raising the issue of their security. The current red-teaming approaches, especially those that emphasize on timely injection, do not have much to say about weaknesses associated with these advanced approaches. The proposed research suggests the next-generation methods of adversarial threat-simulations in the context of LLM cybersecurity, which goes beyond the standard focus on prompt injection. An extensive theoretical framework is presented on how to classify adversarial threats which covers the whole lifecycle of the LLM, both during the training and the deployment. In the manuscript, the innovative red-teaming approaches, such as scenario-based simulations, automated adversarial generation, and ecosystem-wide red teaming are also described to give a more comprehensive review of LLM security. The most important conclusions are that the existing red-team activities are not sufficient to tackle the system vulnerabilities, which leaves LLMs vulnerable to both stage-by- stage and multi-stage attacks. The study has helped to advance a more serious method of obtaining LLMs, as well as provided information on extensive red-teaming solutions with an expanded attack surface and threat list. The results highlight the importance of ongoing and dynamic security evaluations and develop a basis on which future research can be conducted to make LLM more resilient to new adversarial threats.

Keywords : Machine Learning Security, Red Teaming, Prompt Injection, Threat Modeling, Cybersecurity, Adversarial Simulation, Large Language Models, Generative AI.

References :

Abdali, S., Anarfi, R., Barberan, C., He, J., & Shayegani, E. (2024). Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices. https://doi.org/10.48550/ARXIV.2403.12503
Abuadbba, A., Hicks, C., Moore, K., Mavroudis, V., Hasircioglu, B., Goel, D., & Jennings, P. (2025). From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs. https://doi.org/10.48550/ARXIV.2506.13434
Al-Azzawi, M., Doan, D., Sipola, T., Hautamäki, J., & Kokkonen, T. (2025). Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review. https://doi.org/10.48550/ARXIV.2503.19626
Amich, A. (2024). Multifaceted Characterization and Enhancement of Machine Learning Security. Deep Blue (University of Michigan). https://doi.org/10.7302/24910
Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N., Lutz, R., Dheekonda, R. S. R., Jagdagdorj, B.-E., Kim, E., Song, J., Hines, K., Jones, D., Severi, G., Lundeen, R., … Russinovich, M. (2025). Lessons From Red Teaming 100 Generative AI Products. https://doi.org/10.48550/ARXIV.2501.07238
Das, B. C., Amini, M. H., & Wu, Y. (2025). Security and Privacy Challenges of Large Language Models: A Survey [Review of Security and Privacy Challenges of Large Language Models: A Survey]. ACM Computing Surveys. Association for Computing Machinery. https://doi.org/10.1145/3712001
Derczynski, L., Galinkin, E., Martin, J., Majumdar, S., & Inie, N. (2024). garak: A Framework for Security Probing Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.11036
Derner, E., Batistič, K., Zahálka, J., & Babuška, R. (2023). A Security Risk Taxonomy for Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2311.11415
Dong, Y., Mu, R., Zhang, Y., Sun, S., Zhang, T., Wu, C., Jin, G., Qi, Y., Hu, J., Meng, J., Bensalem, S., & Huang, X. (2024). Safeguarding Large Language Models: A Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.02622
Feffer, M., Sinha, A., Deng, W. H., Lipton, Z. C., & Heidari, H. (2024). Red-Teaming for Generative AI: Silver Bullet or Security Theater? https://doi.org/10.48550/ARXIV.2401.15897
Fu, T., Sharma, M., Torr, P., Cohen, S. B., Krueger, D., & Barez, F. (2024). PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning. https://doi.org/10.48550/ARXIV.2410.08811
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. 79. https://doi.org/10.1145/3605764.3623985
Hassanin, M., & Moustafa, N. (2024). A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.14487
Hill, B., Parla, S., Balabhadruni, V. A., Padmalayam, A. P., & Sharma, S. C. S. (2025). Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs. https://doi.org/10.48550/ARXIV.2509.04615
Huang, X., Ruan, W., Huang, W., Jin, G., Dong, Y., Wu, C., Bensalem, S., Mu, R., Yi, Q., Zhao, X., Cai, K., Zhang, Y., Wu, S., Xu, P., Wu, D., Freitas, A., & Mustafa, M. (2024). A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artificial Intelligence Review, 57(7). https://doi.org/10.1007/s10462-024-10824-0
Khomsky, D., Maloyan, N., & Nutfullin, B. (2024). Prompt Injection Attacks in Defended Systems. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.14048
Li, H., Chen, Y., Luo, J., Kang, Y., Zhang, X., Hu, Q., Chan, C., & Song, Y. (2023). Privacy in Large Language Models: Attacks, Defenses and Future Directions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.10383
Li, M. Q., & Fung, B. C. M. (2025). Security concerns for Large Language Models: A survey. Journal of Information Security and Applications, 95, 104284. https://doi.org/10.1016/j.jisa.2025.104284
Liu, F. W., & Hu, C. (2024). Exploring Vulnerabilities and Protections in Large Language Models: A Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.00240
Liu, S., Sheng, Q., Wang, D., Li, Y., Yang, G., & Cao, J. (2025). Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks. https://doi.org/10.48550/ARXIV.2508.20038
Majumdar, S., Pendleton, B., & Gupta, A. (2025). Red Teaming AI Red Teaming. https://doi.org/10.48550/ARXIV.2507.05538
Miranda, M., Ruzzetti, E. S., Santilli, A., Zanzotto, F. M., Bratières, S., & Rodolà, E. (2024). Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2408.05212
Pathade, C. (2025). Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs. https://doi.org/10.48550/ARXIV.2505.04806
Purpura, A., Wadhwa, S., Zymet, J., Gupta, A., Luo, A. A., Rad, M. K., Shinde, S., & Sorower, M. S. (2025). Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models. 335. https://doi.org/10.18653/v1/2025.trustnlp-main.23
Qiang, Y., Zhou, X., Zade, S. Z., Roshani, M. A., Zytko, D., & Zhu, D. (2024). Learning to Poison Large Language Models During Instruction Tuning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2402.13459
Radanliev, P., & Santos, O. (2023). Adversarial Attacks Can Deceive AI Systems, Leading to Misclassification or Incorrect Decisions. https://doi.org/10.20944/preprints202309.2064.v1
Rawat, A., Schoepf, S., Zizzo, G., Cornacchia, G., Hameed, M. Z., Fraser, K., Miehling, E., Buesser, B., Daly, E. M., Purcell, M., Sattigeri, P., Chen, P.-Y., & Varshney, K. R. (2024). Attack Atlas: A Practitioner’s Perspective on Challenges and Pitfalls in Red Teaming GenAI. https://doi.org/10.48550/ARXIV.2409.15398
Shayegani, E., Mamun, M. A. A., Fu, Y., Zaree, P., Dong, Y., & Abu‐Ghazaleh, N. (2023). Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2310.10844
Swanda, A., Chang, A., Chen, A., Burch, F., Kassianik, P., & Berlin, K. (2025). A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2509.20639
Tete, S. B. (2024a). Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications. https://doi.org/10.48550/ARXIV.2406.11007
Tete, S. B. (2024b). Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.11007
Verma, A., Krishna, S., Gehrmann, S., Seshadri, M., Pradhan, A., Ault, T., Barrett, L., Rabinowitz, D., Doucette, J., & Phan, N. (2024a). Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). https://doi.org/10.48550/ARXIV.2407.14937
Verma, A., Krishna, S., Gehrmann, S., Seshadri, M., Pradhan, A., Ault, T., Barrett, L., Rabinowitz, D., Doucette, J., & Phan, N. (2024b). Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs). arXiv (Cornell University). https://doi.org/10.48550/arxiv.2407.14937
Vitorino, J., Maia, E., & Praça, I. (2024). Adversarial Evasion Attack Efficiency against Large Language Models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.08050
Wan, A., Wallace, E., Shen, S., & Klein, D. (2023). Poisoning Language Models During Instruction Tuning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.00944
Wang, N., Walter, K., Gao, Y., & Abuadbba, A. (2025). Large Language Model Adversarial Landscape Through the Lens of Attack Objectives. https://doi.org/10.48550/ARXIV.2502.02960
Wang, S., Zhu, T., Liu, B., Ding, M., Guo, X., Ye, D., & Zhou, W. (2024). Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2406.07973
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., & Zhang, Y. (2024). A survey on large language model (LLM) security and privacy: The Good, The Bad, and The Ugly. High-Confidence Computing, 4(2), 100211. https://doi.org/10.1016/j.hcc.2024.100211
Yip, D. W., Esmradi, A., & Chan, C. F. (2023). A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 71, 1. https://doi.org/10.1109/csde59766.2023.10487667
Zangana, H. M., Mustafa, F. M., & Li, S. (2024). Large Language Models in Cybersecurity. In Advances in information security, privacy, and ethics book series (p. 277). IGI Global. https://doi.org/10.4018/979-8-3373-1102-9.ch009
Zhang, X., Lyu, D., & Li, X. (2025). Risk Assessment and Security Analysis of Large Language Models. https://doi.org/10.48550/ARXIV.2508.17329

Big Language Models (LLMs) are becoming more and more exploited in sensitive areas, thus raising the issue of their security. The current red-teaming approaches, especially those that emphasize on timely injection, do not have much to say about weaknesses associated with these advanced approaches. The proposed research suggests the next-generation methods of adversarial threat-simulations in the context of LLM cybersecurity, which goes beyond the standard focus on prompt injection. An extensive theoretical framework is presented on how to classify adversarial threats which covers the whole lifecycle of the LLM, both during the training and the deployment. In the manuscript, the innovative red-teaming approaches, such as scenario-based simulations, automated adversarial generation, and ecosystem-wide red teaming are also described to give a more comprehensive review of LLM security. The most important conclusions are that the existing red-team activities are not sufficient to tackle the system vulnerabilities, which leaves LLMs vulnerable to both stage-by- stage and multi-stage attacks. The study has helped to advance a more serious method of obtaining LLMs, as well as provided information on extensive red-teaming solutions with an expanded attack surface and threat list. The results highlight the importance of ongoing and dynamic security evaluations and develop a basis on which future research can be conducted to make LLM more resilient to new adversarial threats.

Keywords : Machine Learning Security, Red Teaming, Prompt Injection, Threat Modeling, Cybersecurity, Adversarial Simulation, Large Language Models, Generative AI.

Paper Submission Last Date
28 - February - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.