Authors :
Tharakesvulu Vangalapat; Samreen Iftekhar Shaikh
Volume/Issue :
Volume 11 - 2026, Issue 2 - February
Google Scholar :
https://tinyurl.com/2vddtsr3
Scribd :
https://tinyurl.com/5n8c7ftw
DOI :
https://doi.org/10.38124/ijisrt/26feb1090
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Background:
Large language model (LLM)-based agentic systems are evolving beyond single-turn generators into autonomous, toolusing, multi-agent workflows with persistent memory and self-directed planning. When these agents collaborate,
hallucinations no longer remain local; they can propagate across agent boundaries and trigger real-world operational
failures.
Objective:
This paper provides a structured foundation for building secure, reliable, and hallucination-resilient agentic AI by
surveying failure modes and proposing a layered trust taxonomy with enforceable controls.
Methods:
We reviewed literature on LLM reasoning, retrieval-augmented generation (RAG), hallucination detection, multi-agent
frameworks, and AI governance (including zero-trust security principles) and catalogued failure modes across reliability
and security dimensions.
Results:
We present a seven-layer trust taxonomy spanning identity, planning, communication, memory, retrieval, execution,
and oversight. From this taxonomy, we derive six reusable secure-coordination design patterns and propose a model-agnostic
reference architecture for auditable, policy-enforced agentic workflows.
Conclusion:
Trustworthiness in agentic AI is fundamentally a system property, not merely a model property. The proposed
taxonomy and design patterns provide practical, implementation-independent guidance for securing multi-agent LLM
deployments in research and high-assurance enterprise contexts.
Plain Language Summary:
AI systems built from large language models can now plan tasks, use tools, and work together as teams of specialized
agents. This collaboration creates new dangers: when one agent fabricates information, the other agents may act on it as
though it were true, spreading errors throughout the system. This paper maps where trust breaks down, from agent identity
and message passing to memory and tool use, and proposes design rules and a system blueprint for more reliable and secure
AI teams. agentic AI, multi-agent systems, trustworthy AI, hallucination mitigation, retrieval-augmented generation, tool
use, zero trust, AI governance.
References :
- A. Vaswani et al., “Attention is all you need,” in NeurIPS, 2017.
- T. Brown et al., “Language models are few-shot learners,” in NeurIPS, 2020.
- C. Raffel et al., “Exploring the limits of transfer learning with t5,” JMLR, 2020.
- OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023. [5] S. Bubeck et al., “Sparks of agi,” arXiv, 2023.
- S. Yao et al., “React: Synergizing reasoning and acting in llms,” ICLR, 2023.
- T. Schick et al., “Toolformer: Language models can teach themselves to use tools,” NeurIPS, 2023.
- L. Gao et al., “Program-aided language models,” ICML, 2023.
- Q. Wu et al., “Autogen: Multi-agent conversation framework,” arXiv, 2023. [10] J. S. Park et al., “Generative agents,” in UIST, 2023.
- J. Wei et al., “Chain-of-thought prompting elicits reasoning in llms,” NeurIPS, 2022.
- P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive nlp,” NeurIPS, 2020.
- K. Guu et al., “Realm: Retrieval-augmented language model pre-training,” ICML, 2020.
- G. Izacard and E. Grave, “Leveraging passage retrieval with generative models,” arXiv, 2021.
- S. Robertson and H. Zaragoza, “Bm25 and beyond,” Foundations and Trends in IR, 2009.
- O. Khattab and M. Zaharia, “Colbert: Efficient passage search,” SIGIR, 2020.
- R. Nakano et al., “Webgpt,” arXiv, 2021.
- Y. Bai et al., “Constitutional ai,” arXiv, 2022.
- R. Bommasani et al., “On the opportunities and risks of foundation models,” arXiv, 2021.
- S. Rose et al., “Zero trust architecture,” NIST, Tech. Rep., 2020.
- NIST, “Ai risk management framework,” Tech. Rep., 2023.
- Z. Ji et al., “Survey of hallucination in natural language generation,” ACM Computing Surveys, 2023.
- S. Lin et al., “Truthfulqa,” ACL, 2022.
- P. Manakul et al., “Selfcheckgpt,” EMNLP, 2023.
- N. Carlini et al., “Prompt injection attacks against llm applications,” arXiv, 2023.
- P. Christiano et al., “Deep reinforcement learning from human preferences,” NeurIPS, 2017.
- L. Ouyang et al., “Training lms to follow instructions,” arXiv, 2022.
- S. Min et al., “Factscore,” ICML, 2023.
- A. Madaan et al., “Self-refine: Iterative refinement,” NeurIPS, 2023.
- A. Zou et al., “Universal and transferable adversarial attacks on aligned llms,” arXiv, 2023.
- D. Ganguli et al., “Red teaming language models,” arXiv, 2022.
- C. Dwork, “Differential privacy,” ICALP, 2006.
- M. Abadi et al., “Deep learning with differential privacy,” CCS, 2016.
Background:
Large language model (LLM)-based agentic systems are evolving beyond single-turn generators into autonomous, toolusing, multi-agent workflows with persistent memory and self-directed planning. When these agents collaborate,
hallucinations no longer remain local; they can propagate across agent boundaries and trigger real-world operational
failures.
Objective:
This paper provides a structured foundation for building secure, reliable, and hallucination-resilient agentic AI by
surveying failure modes and proposing a layered trust taxonomy with enforceable controls.
Methods:
We reviewed literature on LLM reasoning, retrieval-augmented generation (RAG), hallucination detection, multi-agent
frameworks, and AI governance (including zero-trust security principles) and catalogued failure modes across reliability
and security dimensions.
Results:
We present a seven-layer trust taxonomy spanning identity, planning, communication, memory, retrieval, execution,
and oversight. From this taxonomy, we derive six reusable secure-coordination design patterns and propose a model-agnostic
reference architecture for auditable, policy-enforced agentic workflows.
Conclusion:
Trustworthiness in agentic AI is fundamentally a system property, not merely a model property. The proposed
taxonomy and design patterns provide practical, implementation-independent guidance for securing multi-agent LLM
deployments in research and high-assurance enterprise contexts.
Plain Language Summary:
AI systems built from large language models can now plan tasks, use tools, and work together as teams of specialized
agents. This collaboration creates new dangers: when one agent fabricates information, the other agents may act on it as
though it were true, spreading errors throughout the system. This paper maps where trust breaks down, from agent identity
and message passing to memory and tool use, and proposes design rules and a system blueprint for more reliable and secure
AI teams. agentic AI, multi-agent systems, trustworthy AI, hallucination mitigation, retrieval-augmented generation, tool
use, zero trust, AI governance.