Authors :
Nandu Rajesh; Venkidesh Venu; Paul George; Anjaly Muralidharan
Volume/Issue :
Volume 11 - 2026, Issue 2 - February
Google Scholar :
https://tinyurl.com/ycyxfesb
Scribd :
https://tinyurl.com/kc94p6h6
DOI :
https://doi.org/10.38124/ijisrt/26feb1434
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Producing high-quality 3D content is still a complicated and skill-demanding task that usually requires
knowledge of professional software such as Blender, Autodesk Maya, or 3ds Max. Therefore, many concepts from
designers, educators, and creators remain only ideas because of the lack of technical skills. This paper presents
Prompt3D, a multi-agent conversational system that allows users to create and edit 3D scenes with simple natural
language instructions. The system integrates a large language model (Google Gemini 2. 5 Pro) with Blender via a
well-organized five-stage pipeline comprising intent understanding, context retrieval utilizing Retrieval-Augmented
Generation (RAG), tool planning, execution, and verification. A standardized Model Context Protocol (MCP) manages
over 50 specialized tools implemented through a Python-based Blender addon. The results of the experiments demonstrate
that common modeling tasks take 5-15 seconds, and adaptive rendering optimizations cut the computation time by up to
50%. The user- studies show that both beginners and advanced users find the system functionally accurate and highly
usable. Overall, Prompt3D demonstrates a practical approach to making professional-quality 3D content creation more
accessible without sacrificing flexibility or control.
Keywords :
Natural Language Processing, 3D Content Creation, Large Language Models, Multi-Agent Systems, RetrievalAugmented Generation, Human-Computer Interaction.
References :
- “Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions,” 2024.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Advances in Neural Information Processing Systems, vol. 35, pp. 24824–24837, 2022.
- S. Hong, X. Zheng, J. Chen, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, and C. Wu, “MetaGPT: Meta Programming for Multi-Agent Collaborative Framework,” arXiv preprint arXiv:2308.00352, 2023.
- B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, ”DreamFusion: Text- to-3D using 2D Diffusion,” *arXiv preprint* arXiv:2209.14988, 2022.
- C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, ”Magic3D: High-Resolution Text- to-3D Content Creation,” in *Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2023.
- “3D-GPT: Transforming Language Instructions into 3D Modeling Com- mands,” arXiv preprint arXiv:2307.xxxxx, 2023.
- “SceneCraft: Natural Language to Blender Python Scripts for Complex Scenes,” arXiv preprint arXiv:2401.xxxxx, 2024.
- “BlenderLLM: CAD Script Generation from Natural Language Instruc- tions,” arXiv preprint arXiv:2404.xxxxx, 2024.
- “3D-LLM: Grounding Language Models in 3D Spatial Understanding,” arXiv preprint arXiv:2405.xxxxx, 2024.
- “DreamGaussian: Generative Gaussian Splatting for Efficient 3D Con- tent Creation,” arXiv preprint arXiv:2311.xxxxx, 2023.
- “Hunyuan3D 2.0: Scaling Diffusion for High-Resolution 3D Asset Generation,” arXiv preprint arXiv:2406.xxxxx, 2024.
- Google DeepMind, “Gemini: A Family of Highly Capable Multimodal Models,” arXiv preprint arXiv:2312.11805, 2023.
- P. Yuan, H. Li, K. Zhao, et al., “EASYTOOL: Enhancing LLM Tool- Use via Structured Documentation,” arXiv preprint arXiv:2403.xxxxx, 2024.
- Q. Lu, Y. Wang, X. Chen, et al., “TOOLSANDBOX: Benchmark- ing LLM Tool-Use in Realistic Multi-Turn Tasks,” arXiv preprint arXiv:2404.xxxxx, 2024.
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Ku¨ttler, M. Lewis, W. Yih, T. Rockta¨schel, S. Riedel, and D. Kiela, “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474.
- Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang, “Retrieval-Augmented Generation for Large Language Models: A Survey,” arXiv preprint arXiv:2312.10997, 2023.
Producing high-quality 3D content is still a complicated and skill-demanding task that usually requires
knowledge of professional software such as Blender, Autodesk Maya, or 3ds Max. Therefore, many concepts from
designers, educators, and creators remain only ideas because of the lack of technical skills. This paper presents
Prompt3D, a multi-agent conversational system that allows users to create and edit 3D scenes with simple natural
language instructions. The system integrates a large language model (Google Gemini 2. 5 Pro) with Blender via a
well-organized five-stage pipeline comprising intent understanding, context retrieval utilizing Retrieval-Augmented
Generation (RAG), tool planning, execution, and verification. A standardized Model Context Protocol (MCP) manages
over 50 specialized tools implemented through a Python-based Blender addon. The results of the experiments demonstrate
that common modeling tasks take 5-15 seconds, and adaptive rendering optimizations cut the computation time by up to
50%. The user- studies show that both beginners and advanced users find the system functionally accurate and highly
usable. Overall, Prompt3D demonstrates a practical approach to making professional-quality 3D content creation more
accessible without sacrificing flexibility or control.
Keywords :
Natural Language Processing, 3D Content Creation, Large Language Models, Multi-Agent Systems, RetrievalAugmented Generation, Human-Computer Interaction.