⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



Meta-Learned Adaptive Proximal Operators for Neural Network Optimization


Authors : Dr. Sahayarajjoseph Nirmalkumar S.

Volume/Issue : Volume 11 - 2026, Issue 6 - June


Google Scholar : https://tinyurl.com/565z6nrz

Scribd : https://tinyurl.com/y35fnf3y

DOI : https://doi.org/10.38124/ijisrt/26jun632

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : We propose a novel approach to neural network optimization by replacing traditional proximal operators with meta-learned adaptive updates, thereby unifying the optimization of step sizes, regularization strengths, and sparsity thresholds into a single learned process. The proposed method introduces a bi-level optimization framework in which a recurrent meta-learner dynamically produces task- and architecture-specific proximal parameters during training, thereby removing the necessity for manual tuning. The system’s central mechanism relies on an LSTM-driven meta-learner handling optimization trajectories and architectural embeddings, with the latter derived from a graph neural network to support generalization across architectures. The proximal updates that emerge blend learned parameters with a continuous shrinkage operator, which prevents gradient discontinuities and preserves sparsity. The meta-learner is trained by optimizing a bi-level objective aimed at reducing the anticipated final loss over tasks, with gradients estimated by truncated backpropagation through time. The framework operates smoothly with traditional neural network training by substituting standard optimizer steps with updates derived from meta-learning. Experiments show that the method adjusts to various architectures and tasks, achieving better performance than fixed proximal approaches and diminishing the need for manual hyperparameter adjustment. Furthermore, the architectural embeddings support zero-shot generalization to novel network structures, which renders the approach especially appropriate for automated machine learning pipelines. This work is important because it moves away from strict optimization heuristics, adopting an approach that learns optimization strategies which inherently adjust to both task demands and architectural limitations.

Keywords : Proximal Operators, Neural Network Optimization, LSTM-Driven Meta-Learner, Zero-Shot Generalization.

References :

  1. R Tibshirani (2010) Proximal gradient descent and acceleration. Lecture Notes.
  2. M Kim & T Hospedales (2025) A stochastic approach to bi-level optimization for hyperparameter optimization and meta learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  3. DP Kingma & J Ba (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  4. GA Saheed, M Abdulkabir & OA Babajide (2026) Optimization Methods for Solving Deep Learning Problems: A Case Study of Adaptive Learning Rate Optimizers. preprints.org.
  5. A Graves (2012) Long short-term memory. Supervised Sequence Labelling With Recurrent Neural Networks.
  6. N Parikh & S Boyd (2014) Proximal algorithms. Foundations and Trends in optimization.
  7. J Yun, AC Lozano & E Yang (2021) Adaptive proximal gradient methods for structured neural networks. In Advances in Neural Information Processing Systems.
  8. LMT Tran, S Reynaud, R Fablet, A Merlini, et al. (2026) Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging. arXiv preprint arXiv:2602.03855.
  9. Z Lai, K Wei, Y Fu, P Härtel & F Heide (2023) -prox: Differentiable proximal algorithm modeling for large-scale optimization. Acm Transactions On Graphics.
  10. M Andrychowicz, M Denil, S Gomez, et al. (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems.
  11. Y Zhang & GB Giannakis (2024) Meta-learning priors using unrolled proximal networks. In The Twelfth International Conference on Learning Representations.
  12. J Bae, P Vicol, JZ HaoChen, et al. (2022) Amortized proximal optimization. In Advances in Neural Information Processing Systems.
  13. Y Wu, H Cao, Y Lai, L Zhao, X Deng, et al. (2024) Edge computing and few-shot learning featured intelligent framework in digital twin empowered mobile networks. IEEE Transactions on Vehicular Technology.
  14. Z Fu, L Zhang, W Huang, D Cheng, et al. (2024) Learning sensor sample-reweighting for dynamic early-exit activity recognition via meta learning. IEEE Journal of Biomedical and Health Informatics.
  15. T Hospedales, A Antoniou, P Micaelli, et al. (2021) Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  16. H Zou & T Hastie (2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology.
  17. GV Puskorius & LA Feldkamp (1994) Truncated backpropagation through time and Kalman filter training for neurocontrol. In Proceedings of.
  18. V Dumoulin, E Perez, N Schucher, F Strub, H Vries, et al. (2018) Feature-wise transformations. Distill.
  19. A Krizhevsky & G Hinton (2009) Learning multiple layers of features from tiny images. cs.utoronto.ca.
  20. A Asuncion & D Newman (2007) UCI machine learning repository. ergodicity.net.
  21. G Brockman, V Cheung, L Pettersson, et al. (2016) Openai gym. arXiv preprint arXiv:1606.01540.
  22. V Mnih, K Kavukcuoglu, D Silver, AA Rusu, J Veness, et al. (2015) Human-level control through deep reinforcement learning. nature.
  23. K He, X Zhang, S Ren & J Sun (2016) Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition.
  24. K Simonyan & A Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  25. V Mnih, AP Badia, M Mirza, A Graves, et al. (2016) Asynchronous methods for deep reinforcement learning. In International Conference On Machine Learning.
  26. X Chen, S Liu, R Sun & M Hong (2018) On the convergence of a class of adam-type algorithms for non-convex optimization. arXiv preprint arXiv:1808.02941.
  27. D Xu, S Zhang, H Zhang & DP Mandic (2021) Convergence of the RMSProp deep learning method with penalty for nonconvex optimization. Neural Networks.
  28. R Luo, F Tian, T Qin, E Chen, et al. (2018) Neural architecture optimization. In Advances in Neural Information Processing Systems.

We propose a novel approach to neural network optimization by replacing traditional proximal operators with meta-learned adaptive updates, thereby unifying the optimization of step sizes, regularization strengths, and sparsity thresholds into a single learned process. The proposed method introduces a bi-level optimization framework in which a recurrent meta-learner dynamically produces task- and architecture-specific proximal parameters during training, thereby removing the necessity for manual tuning. The system’s central mechanism relies on an LSTM-driven meta-learner handling optimization trajectories and architectural embeddings, with the latter derived from a graph neural network to support generalization across architectures. The proximal updates that emerge blend learned parameters with a continuous shrinkage operator, which prevents gradient discontinuities and preserves sparsity. The meta-learner is trained by optimizing a bi-level objective aimed at reducing the anticipated final loss over tasks, with gradients estimated by truncated backpropagation through time. The framework operates smoothly with traditional neural network training by substituting standard optimizer steps with updates derived from meta-learning. Experiments show that the method adjusts to various architectures and tasks, achieving better performance than fixed proximal approaches and diminishing the need for manual hyperparameter adjustment. Furthermore, the architectural embeddings support zero-shot generalization to novel network structures, which renders the approach especially appropriate for automated machine learning pipelines. This work is important because it moves away from strict optimization heuristics, adopting an approach that learns optimization strategies which inherently adjust to both task demands and architectural limitations.

Keywords : Proximal Operators, Neural Network Optimization, LSTM-Driven Meta-Learner, Zero-Shot Generalization.

Paper Submission Last Date
30 - June - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe