Authors :
Dr. Sahayarajjoseph Nirmalkumar S.
Volume/Issue :
Volume 11 - 2026, Issue 6 - June
Google Scholar :
https://tinyurl.com/565z6nrz
Scribd :
https://tinyurl.com/y35fnf3y
DOI :
https://doi.org/10.38124/ijisrt/26jun632
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
We propose a novel approach to neural network optimization by replacing traditional proximal operators with
meta-learned adaptive updates, thereby unifying the optimization of step sizes, regularization strengths, and sparsity
thresholds into a single learned process. The proposed method introduces a bi-level optimization framework in which a
recurrent meta-learner dynamically produces task- and architecture-specific proximal parameters during training, thereby
removing the necessity for manual tuning. The system’s central mechanism relies on an LSTM-driven meta-learner handling
optimization trajectories and architectural embeddings, with the latter derived from a graph neural network to support
generalization across architectures. The proximal updates that emerge blend learned parameters with a continuous
shrinkage operator, which prevents gradient discontinuities and preserves sparsity. The meta-learner is trained by
optimizing a bi-level objective aimed at reducing the anticipated final loss over tasks, with gradients estimated by truncated
backpropagation through time. The framework operates smoothly with traditional neural network training by substituting
standard optimizer steps with updates derived from meta-learning. Experiments show that the method adjusts to various
architectures and tasks, achieving better performance than fixed proximal approaches and diminishing the need for manual
hyperparameter adjustment. Furthermore, the architectural embeddings support zero-shot generalization to novel network
structures, which renders the approach especially appropriate for automated machine learning pipelines. This work is
important because it moves away from strict optimization heuristics, adopting an approach that learns optimization
strategies which inherently adjust to both task demands and architectural limitations.
Keywords :
Proximal Operators, Neural Network Optimization, LSTM-Driven Meta-Learner, Zero-Shot Generalization.
References :
- R Tibshirani (2010) Proximal gradient descent and acceleration. Lecture Notes.
- M Kim & T Hospedales (2025) A stochastic approach to bi-level optimization for hyperparameter optimization and meta learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
- DP Kingma & J Ba (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- GA Saheed, M Abdulkabir & OA Babajide (2026) Optimization Methods for Solving Deep Learning Problems: A Case Study of Adaptive Learning Rate Optimizers. preprints.org.
- A Graves (2012) Long short-term memory. Supervised Sequence Labelling With Recurrent Neural Networks.
- N Parikh & S Boyd (2014) Proximal algorithms. Foundations and Trends in optimization.
- J Yun, AC Lozano & E Yang (2021) Adaptive proximal gradient methods for structured neural networks. In Advances in Neural Information Processing Systems.
- LMT Tran, S Reynaud, R Fablet, A Merlini, et al. (2026) Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging. arXiv preprint arXiv:2602.03855.
- Z Lai, K Wei, Y Fu, P Härtel & F Heide (2023) ∇-prox: Differentiable proximal algorithm modeling for large-scale optimization. Acm Transactions On Graphics.
- M Andrychowicz, M Denil, S Gomez, et al. (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems.
- Y Zhang & GB Giannakis (2024) Meta-learning priors using unrolled proximal networks. In The Twelfth International Conference on Learning Representations.
- J Bae, P Vicol, JZ HaoChen, et al. (2022) Amortized proximal optimization. In Advances in Neural Information Processing Systems.
- Y Wu, H Cao, Y Lai, L Zhao, X Deng, et al. (2024) Edge computing and few-shot learning featured intelligent framework in digital twin empowered mobile networks. IEEE Transactions on Vehicular Technology.
- Z Fu, L Zhang, W Huang, D Cheng, et al. (2024) Learning sensor sample-reweighting for dynamic early-exit activity recognition via meta learning. IEEE Journal of Biomedical and Health Informatics.
- T Hospedales, A Antoniou, P Micaelli, et al. (2021) Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- H Zou & T Hastie (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology.
- GV Puskorius & LA Feldkamp (1994) Truncated backpropagation through time and Kalman filter training for neurocontrol. In Proceedings of.
- V Dumoulin, E Perez, N Schucher, F Strub, H Vries, et al. (2018) Feature-wise transformations. Distill.
- A Krizhevsky & G Hinton (2009) Learning multiple layers of features from tiny images. cs.utoronto.ca.
- A Asuncion & D Newman (2007) UCI machine learning repository. ergodicity.net.
- G Brockman, V Cheung, L Pettersson, et al. (2016) Openai gym. arXiv preprint arXiv:1606.01540.
- V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, et al. (2015) Human-level control through deep reinforcement learning. nature.
- K He, X Zhang, S Ren & J Sun (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
- K Simonyan & A Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- V Mnih, AP Badia, M Mirza, A Graves, et al. (2016) Asynchronous methods for deep reinforcement learning. In International Conference On Machine Learning.
- X Chen, S Liu, R Sun & M Hong (2018) On the convergence of a class of adam-type algorithms for non-convex optimization. arXiv preprint arXiv:1808.02941.
- D Xu, S Zhang, H Zhang & DP Mandic (2021) Convergence of the RMSProp deep learning method with penalty for nonconvex optimization. Neural Networks.
- R Luo, F Tian, T Qin, E Chen, et al. (2018) Neural architecture optimization. In Advances in Neural Information Processing Systems.
We propose a novel approach to neural network optimization by replacing traditional proximal operators with
meta-learned adaptive updates, thereby unifying the optimization of step sizes, regularization strengths, and sparsity
thresholds into a single learned process. The proposed method introduces a bi-level optimization framework in which a
recurrent meta-learner dynamically produces task- and architecture-specific proximal parameters during training, thereby
removing the necessity for manual tuning. The system’s central mechanism relies on an LSTM-driven meta-learner handling
optimization trajectories and architectural embeddings, with the latter derived from a graph neural network to support
generalization across architectures. The proximal updates that emerge blend learned parameters with a continuous
shrinkage operator, which prevents gradient discontinuities and preserves sparsity. The meta-learner is trained by
optimizing a bi-level objective aimed at reducing the anticipated final loss over tasks, with gradients estimated by truncated
backpropagation through time. The framework operates smoothly with traditional neural network training by substituting
standard optimizer steps with updates derived from meta-learning. Experiments show that the method adjusts to various
architectures and tasks, achieving better performance than fixed proximal approaches and diminishing the need for manual
hyperparameter adjustment. Furthermore, the architectural embeddings support zero-shot generalization to novel network
structures, which renders the approach especially appropriate for automated machine learning pipelines. This work is
important because it moves away from strict optimization heuristics, adopting an approach that learns optimization
strategies which inherently adjust to both task demands and architectural limitations.
Keywords :
Proximal Operators, Neural Network Optimization, LSTM-Driven Meta-Learner, Zero-Shot Generalization.