Towards autonomous kubernetes a framework for aidriven operations aiops| International Journal of Innovative Science and Research Technology

Towards Autonomous Kubernetes: A Framework for AI-Driven Operations (AIOps)

Authors : Kishan Raj Bellala

Volume/Issue : Volume 10 - 2025, Issue 9 - September

Google Scholar : https://tinyurl.com/27zxaphd

Scribd : https://tinyurl.com/4terns6x

DOI : https://doi.org/10.38124/ijisrt/25sep1016

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : The increasing use of Kubernetes has brought substantial operational complexity because manual management of its numerous dynamic components (pods, nodes, networks) is slow, error-prone, and unsustainable at scale. This research investigates how AIOps (Artificial Intelligence for IT Operations) principles can move past native automation to establish fully autonomous Kubernetes management. The proposed framework uses machine learning to detect anomalies, identify causes, and predict scaling needs before executing automatic remediation steps. Our methodology demonstrates that AIOps can enhance system reliability and reduce operational Toil while optimizing resource efficiency through closed-loop observation-action cycles, leading to self-healing Kubernetes ecosystems that require minimal human intervention.

Keywords : Kubernetes, AIOps (Artificial Intelligence for IT Operations), Autonomous Operations, Self-Healing Systems, Anomaly Detection, Root Cause Analysis (RCA), Predictive Scaling, Automated Remediation, Operational Complexity, Machine Learning for IT Operations, Container Orchestration, Site Reliability Engineering (SRE).

References :

Liu, C., Wang, B., Liu, J., Tang, Z., & Cai, Z. (2020). A protocol-independent container network observability analysis system based on eBPF. 697–702. https://doi.org/10.1109/icpads51040.2020.00099
Qi, S., Kulkarni, S. G., & Ramakrishnan, K. K. (2020). Assessing Container Network Interface Plugins: Functionality, Performance, and Scalability. IEEE Transactions on Network and Service Management, 18(1), 656–671. https://doi.org/10.1109/tnsm.2020.3047545
Itiel Shwartz. (2025, August 21). AIOPs for kubernetes (or KAIOPs?). Komodor. https://komodor.com/blog/aiops-for-kubernetes-or-kaiops/
Arshad, K., Naseer, S., Ali, R. F., Muneer, A., Aziz, I. A., Khan, N. S., & Taib, S. M. (2022). Deep Reinforcement Learning for Anomaly Detection: A Systematic Review. IEEE Access, 10, 124017–124035. https://doi.org/10.1109/access.2022.3224023
Alsalman, D. (2024). A Comparative Study of Anomaly Detection Techniques for IoT Security Using Adaptive Machine Learning for IoT Threats. IEEE Access, 12, 14719–14730. https://doi.org/10.1109/access.2024.3359033
Shahzad, F., Al-Jumeily Obe, D., Mannan, A. Almadhor, A. S., Javed, A. R., & Baker, T. (2022). Cloud-based multiclass anomaly detection and categorization using ensemble learning. Journal of Cloud Computing, 11(1). https://doi.org/10.1186/s13677-022-00329-y
Yan, H., Ge, Z., Yates, J., Breslau, L., Massey, D., & Pei, D. (2012). G-RCA: A Generic Root Cause Analysis Platform for Service Quality Management in Large IP Networks. IEEE/ACM Transactions on Networking, 20(6), 1734–1747. https://doi.org/10.1109/tnet.2012.2188837
Sun, Y., Qin, W., Xu, H., & Zhuang, Z. (2021). An adaptive fault detection and root-cause analysis scheme for complex industrial processes using moving window KPCA and information geometric causal inference. Journal of Intelligent Manufacturing, 32(7), 2007–2021. https://doi.org/10.1007/s10845-021-01752-9
Li, M., Li, Z., Pei, D., Zhang, W., Sui, K., Yin, K., & Nie, X. (2022). Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. 53, 3230–3240. https://doi.org/10.1145/3534678.3539041
Yuan, H., & Liao, S. (2024). A Time Series-Based Approach to Elastic Kubernetes Scaling. Electronics, 13(2), 285. https://doi.org/10.3390/electronics13020285
Toka, L., Dobreff, G., Sonkoly, B., & Fodor, B. (2020). Adaptive AI-based auto-scaling for Kubernetes. 16, 599–608. https://doi.org/10.1109/ccgrid49817.2020.00-33
Taherizadeh, S., & Stankovski, V. (2018). Dynamic Multi-Level Auto-scaling Rules for Containerized Applications. The Computer Journal, 62(2), 174–197. https://doi.org/10.1093/comjnl/bxy043
Toka, L., Dobreff, G., Sonkoly, B., & Fodor, B. (2021). Machine Learning-Based Scaling Management for Kubernetes Edge Clusters. IEEE Transactions on Network and Service Management, 18(1), 958–972. https://doi.org/10.1109/tnsm.2021.3052837
Zhao, A., Song, J., Huang, Q., Huang, Y., Chen, Z., & Zou, L. (2019). Research on Resource Prediction Model Based on Kubernetes Container Auto-scaling Technology. IOP Conference Series: Materials Science and Engineering, 569(5), 052092. https://doi.org/10.1088/1757-899x/569/5/052092
Nguyen, T.-T., Yeom, Y.-J., Kim, T., Park, D.-H., & Kim, S. (2020). Horizontal Pod Autoscaling in Kubernetes for Elastic Container Orchestration. Sensors (Basel, Switzerland), 20(16), 4621. https://doi.org/10.3390/s20164621
Tran, M.-N., Vu, X. T., & Kim, Y. (2022). Proactive Stateful Fault-Tolerant System for Kubernetes Containerized Services. IEEE Access, 10, 102181–102194. https://doi.org/10.1109/access.2022.3209257
Kim, D., Kim, E., Lee, C., Helal, S., & Muhammad, H. (2019). TOSCA-Based and Federation-Aware Cloud Orchestration for Kubernetes Container Platform. Applied Sciences, 9(1), 191. https://doi.org/10.3390/app9010191
Bose, D. B., Shamim, S. I., & Rahman, A. (2021). ‘Under-reported’ Security Defects in Kubernetes Manifests. 9–12. https://doi.org/10.1109/encycris52570.2021.00009
Tran, M.-N., Vu, X. T., & Kim, Y. (2022). Proactive Stateful Fault-Tolerant System for Kubernetes Containerized Services. IEEE Access, 10, 102181–102194. https://doi.org/10.1109/access.2022.3209257
Tien, C., Huang, T., Tien, C., Huang, T., & Kuo, S. (2019). KubAnomaly: Anomaly detection for the Docker orchestration platform with neural network approaches. Engineering Reports, 1(5). https://doi.org/10.1002/eng2.12080
Li, H., Sun, J., & Ke, X. (2024). AI-Driven Optimization System for Large-Scale Kubernetes Clusters: Enhancing Cloud Infrastructure Availability, Security, and Disaster Recovery. Journal of Artificial Intelligence General Science (JAIGS) ISSN:3006-4023, 2(1), 281–306. https://doi.org/10.60087/jaigs.v2i1.244
Levin, A., Mcshane, N., Garion, S., Kolodner, E. K., Kugler, M., Lorenz, D. H., & Barabash, K. (2019). AIOps for a Cloud Object Storage Service. 165–169. https://doi.org/10.1109/bigdatacongress.2019.00036
Nedelkoski, S., Cardoso, J., & Kao, O. (2019). Anomaly Detection from System Tracing Data Using Multimodal Deep Learning. 179–186. https://doi.org/10.1109/cloud.2019.00038
Kim, D., Kim, E., Lee, C., Helal, S., & Muhammad, H. (2019). TOSCA-Based and Federation-Aware Cloud Orchestration for Kubernetes Container Platform. Applied Sciences, 9(1), 191. https://doi.org/10.3390/app9010191
Bose, D. B., Shamim, S. I., & Rahman, A. (2021). ‘Under-reported’ Security Defects in Kubernetes Manifests. 9–12. https://doi.org/10.1109/encycris52570.2021.00009
KASHIV, D. J. AI-Driven Networks: Architecting the Future of Autonomous, Secure, and Cloud-Native connectivity 2025. YASHITA PRAKASHAN PRIVATE LIMITED.
Johansson, B., Papadopoulos, A. V., Ragberger, M., & Nolte, T. (2022). Kubernetes Orchestration of High Availability Distributed Control Systems. 1–8. https://doi.org/10.1109/icit48603.2022.10002757
Jorge-Martinez, D., Ariza-Colpas, P., Chakraborty, C., Butt, S. A., De-La-Hoz-Franco, E., Onyema, E. M., & Shaheen, Q. (2021). Artificial intelligence-based Kubernetes container for scheduling nodes of energy composition. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-021-01195-8
Bogatinovski, J., Kao, O., Nedelkoski, S., & Cardoso, J. (2020). Self-Supervised Anomaly Detection from Distributed Traces. 342–347. https://doi.org/10.1109/ucc48980.2020.00054
Wei-Guo, Z., Xi-Lin, M., & Jin-Zhong, Z. (2018). Research on Kubernetes’ Resource Scheduling Scheme. 144–148. https://doi.org/10.1145/3290480.3290507
Nguyen, T.-T., Yeom, Y.-J., Kim, T., Park, D.-H., & Kim, S. (2020). Horizontal Pod Autoscaling in Kubernetes for Elastic Container Orchestration. Sensors (Basel, Switzerland), 20(16), 4621. https://doi.org/10.3390/s20164621
Tien, C., Huang, T., Tien, C., Huang, T., & Kuo, S. (2019). KubAnomaly: Anomaly detection for the Docker orchestration platform with neural network approaches. Engineering Reports, 1(5). https://doi.org/10.1002/eng2.12080
Sabharwal, N., & Bhardwaj, G. (2022). Hands-on AIOps. Apress eBooks. https://doi. org/10.1007/978-1-4842-8267-0.
Reiter, L., & Wedel, F. H. (2021). AIOps–A Systematic Literature Review.

The increasing use of Kubernetes has brought substantial operational complexity because manual management of its numerous dynamic components (pods, nodes, networks) is slow, error-prone, and unsustainable at scale. This research investigates how AIOps (Artificial Intelligence for IT Operations) principles can move past native automation to establish fully autonomous Kubernetes management. The proposed framework uses machine learning to detect anomalies, identify causes, and predict scaling needs before executing automatic remediation steps. Our methodology demonstrates that AIOps can enhance system reliability and reduce operational Toil while optimizing resource efficiency through closed-loop observation-action cycles, leading to self-healing Kubernetes ecosystems that require minimal human intervention.

Paper Submission Last Date
28 - February - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.