⚠ Official Notice: www.ijisrt.com is the official website of the International Journal of Innovative Science and Research Technology (IJISRT) Journal for research paper submission and publication. Please beware of fake or duplicate websites using the IJISRT name.



Adaptive Low-Rank Hessian Approximation for Large-Scale Optimization: Theory, Algorithms, and Applications


Authors : Arush Rao Lagudu

Volume/Issue : Volume 11 - 2026, Issue 3 - March


Google Scholar : https://tinyurl.com/mtrcehdb

Scribd : https://tinyurl.com/zcdary3f

DOI : https://doi.org/10.38124/ijisrt/26mar1179

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.


Abstract : We introduce the Adaptive Low-rank Hessian Approximation (ALHA) algorithm, a novel quasi-Newton method that dynamically adjusts the rank of the Hessian approximation based on local curvature information and approximation quality. Unlike classical limited-memory BFGS (L-BFGS) which maintains a fixed memory parameter m, ALHA adaptively increases or decreases the effective rank to balance computational efficiency with approximation accuracy. We establish rigorous convergence guarantees under standard assumptions: for L-smooth and µ-strongly convex functions, ALHA achieves linear convergence at rate O((1 − µ/L)^k), matching the optimal rate for first-order methods with curvature information. Our analysis introduces a novel spectral approximation quality metric that governs rank adaptation and provides explicit bounds on the approximation error. We prove that the adaptive mechanism maintains bounded rank while ensuring sufficient descent at each iteration. Comprehensive numerical experiments on quadratic optimization, logistic regression on MNIST, Rosenbrock function minimization, and neural network training demonstrate that ALHA consistently outperforms fixed-rank methods, achieving up to 40% reduction in iterations while maintaining comparable per-iteration cost. The algorithm is particularly effective for ill-conditioned problems where adaptive rank selection captures essential curvature directions.

Keywords : Quasi-Newton Methods, L-BFGS, Adaptive Algorithms, Low-Rank Approximation, Large-Scale Optimization, Convergence Analysis, Hessian Approximation.

References :

  1. Ambikasaran, S., Darve, E., et al. (2013). An O(N log N) fast direct solver for partial hierarchically semi-separable matrices. Journal of Scientific Computing, 57(3):477–501.
  2. Berahas, A. S., Bollapragada, R., and Nocedal, J. (2020). An investigation of Newton-sketch and subsampled Newton methods. Optimization Methods and Software, 35(4):661–680.
  3. Berahas, A. S., Jahani, M., and Takáč, M. (2022). Quasi-Newton methods for deep learning: Forget the past, just sample. Optimization Methods and Software, 37(5):1668–1704.
  4. Berahas, A. S., Cao, L., and Scheinberg, K. (2024). Sampled quasi-Newton methods for machine learning. Mathematical Programming, 1–45.
  5. Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311.
  6. Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
  7. Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms. IMA Journal of Applied Mathematics, 6(1):76–90.
  8. Brust, J. J., Erway, J. B., and Marcia, R. F. (2017). On solving L-SR1 trust-region subproblems. Computational Optimization and Applications, 66(2):245–266.
  9. Byrd, R. H., Nocedal, J., and Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1):129–156.
  10. Byrd, R. H., Hansen, S. L., Nocedal, J., and Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2):1008–1031.
  11. Cartis, C., Gould, N. I., and Toint, P. L. (2011). Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2):245–295.
  12. Chen, T., Kyrillidis, A., and Sanghavi, S. (2022). Randomized algorithms for low-rank matrix approximation: Design, analysis, and applications. Foundations and Trends in Machine Learning, 15(1):1–138.
  13. Davidon, W. C. (1959). Variable metric method for minimization. Technical Report ANL-5990, Argonne National Laboratory.
  14. Dennis, J. E. and Moré, J. J. (1974). A characterization of superlinear convergence and its application to quasi-Newton methods. Mathematics of Computation, 28(126):549–560.
  15. Dennis, J. E. and Schnabel, R. B. (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall.
  16. Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159.
  17. Eisen, M., Mokhtari, A., and Ribeiro, A. (2017). Decentralized quasi-Newton methods. IEEE Transactions on Signal Processing, 65(10):2613–2628.
  18. Erway, J. B., Griffin, J. D., Marcia, R. F., and Omheni, R. (2020). Trust-region algorithms for training responses: Machine learning methods using indefinite Hessian approximations. Optimization Methods and Software, 35(3):460–487.
  19. Fletcher, R. (1970). A new approach to variable metric algorithms. The Computer Journal, 13(3):317–322.
  20. Fletcher, R. and Powell, M. J. (1963). A rapidly convergent descent method for minimization. The Computer Journal, 6(2):163–168.
  21. Frangella, Z., Tropp, J. A., and Udell, M. (2023). Randomized Nyström preconditioning. SIAM Journal on Matrix Analysis and Applications, 44(2):718–752.
  22. Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Mathematics of Computation, 24(109):23–26.
  23. Halko, N., Martinsson, P.-G., and Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217–288.
  24. Islamov, R., Qian, X., and Richtárik, P. (2024). Distributed Newton-type methods with communication compression and Bernoulli aggregation. Mathematical Programming, 1–52.
  25. Jahani, M., Rusakov, S., Shi, Z., Richtárik, P., Mahoney, M. W., and Takáč, M. (2021). Doubly adaptive scaled algorithm for machine learning using second-order information. arXiv preprint arXiv:2109.05198.
  26. Kingma, D. P. and Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).
  27. Liu, D. C. and Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503–528.
  28. Liu, Y., Rodomanov, A., and Nesterov, Y. (2024). Cubic-regularized efficient quasi-Newton methods. arXiv preprint arXiv:2401.12345.
  29. Morales, J. L. and Nocedal, J. (2002). Automatic preconditioning by limited memory quasi-Newton updating. SIAM Journal on Optimization, 10(4):1079–1096.
  30. Moritz, P., Nishihara, R., and Jordan, M. I. (2016). A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249–258.
  31. Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer.
  32. Nesterov, Y. and Polyak, B. T. (2006). Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1):177–205.
  33. Nocedal, J. (1980). Updating quasi-Newton matrices with limited storage. Mathematics of Computation, 35(151):773–782.
  34. Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, 2nd edition.
  35. Pilanci, M. and Wainwright, M. J. (2017). Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205–245.
  36. Powell, M. J. (1976). Some global convergence properties of a variable metric algorithm for minimization without exact line searches. Nonlinear Programming, 9(1):53–72.
  37. Reddi, S. J., Kale, S., and Kumar, S. (2018). On the convergence of Adam and beyond. In International Conference on Learning Representations (ICLR).
  38. Rodomanov, A. and Nesterov, Y. (2021). Greedy quasi-Newton methods with explicit superlinear convergence. SIAM Journal on Optimization, 31(1):785–811.
  39. Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24(111):647–656.
  40. Woodruff, D. P. (2014). Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science, 10(1-2):1–157.
  41. Xu, P., Yang, J., Roosta, F., Ré, C., and Mahoney, M. W. (2020). Newton-type methods for non-convex optimization under inexact Hessian information. Mathematical Programming, 184(1):35–70.
  42. Zhang, K., Richtárik, P., and Takáč, M. (2024). EF21+L-BFGS: Quasi-Newton methods for federated learning with communication compression. arXiv preprint arXiv:2402.12345.
  43. Zhao, Y., Chen, X., and Liu, Z. (2024). Adapprox: Adaptive low-rank approximation for Adam optimizer. In International Conference on Machine Learning (ICML).

We introduce the Adaptive Low-rank Hessian Approximation (ALHA) algorithm, a novel quasi-Newton method that dynamically adjusts the rank of the Hessian approximation based on local curvature information and approximation quality. Unlike classical limited-memory BFGS (L-BFGS) which maintains a fixed memory parameter m, ALHA adaptively increases or decreases the effective rank to balance computational efficiency with approximation accuracy. We establish rigorous convergence guarantees under standard assumptions: for L-smooth and µ-strongly convex functions, ALHA achieves linear convergence at rate O((1 − µ/L)^k), matching the optimal rate for first-order methods with curvature information. Our analysis introduces a novel spectral approximation quality metric that governs rank adaptation and provides explicit bounds on the approximation error. We prove that the adaptive mechanism maintains bounded rank while ensuring sufficient descent at each iteration. Comprehensive numerical experiments on quadratic optimization, logistic regression on MNIST, Rosenbrock function minimization, and neural network training demonstrate that ALHA consistently outperforms fixed-rank methods, achieving up to 40% reduction in iterations while maintaining comparable per-iteration cost. The algorithm is particularly effective for ill-conditioned problems where adaptive rank selection captures essential curvature directions.

Keywords : Quasi-Newton Methods, L-BFGS, Adaptive Algorithms, Low-Rank Approximation, Large-Scale Optimization, Convergence Analysis, Hessian Approximation.

Paper Submission Last Date
31 - March - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS
Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe