Comparative Study of Deep Learning Optimizers: SGD, Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion


Authors : Ibrahim Khalil Shehada; Rasha Ragheb Atallah; Ashraf Yunis Maghari

Volume/Issue : Volume 10 - 2025, Issue 8 - August


Google Scholar : https://tinyurl.com/yc4jf52j

Scribd : https://tinyurl.com/yrtwtvus

DOI : https://doi.org/10.38124/ijisrt/25aug007

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Note : Google Scholar may take 30 to 40 days to display the article.


Abstract : In order to increase prediction accuracy, deep learning models must be trained by adjusting parameters to minimize a loss function. In supervised learning, the mapping between inputs and their right outputs is learned by training models using labeled input examples. In order to minimize mistakes, predictions are compared to actual outcomes, and optimization methods are used to adjust parameters. Until convergence is reached, these algorithms go through several cycles of iteration. Stochastic Gradient Descent (SGD), Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion are the seven optimization techniques that are evaluated in this study based on training accuracy, test accuracy, training loss, and sensitivity to learning rate. MNIST and CIFAR-10 were the two benchmark datasets used in the experiments. SGD with a learning rate of 0.5 had the best test accuracy of 99.14% and the highest training accuracy of 99.89% on MNIST. With test accuracy of 99.15% and 98%, respectively, at a learning rate of 1e-2, Momentum SGD and Adam likewise demonstrated strong performance. Optimizers like Yogi and Lion, on the other hand, performed competitively at lower learning rates but suffered at higher ones; at 1e-5, Lion's test accuracy was 98.69%. All optimizers displayed comparatively decreased accuracies for CIFAR-10, which was indicative of the dataset's increased complexity. Momentum SGD outperformed other optimizers including Adam, Yogi, and Lion, achieving the highest training accuracy of 98.90% and the best test accuracy of 72.94% at a learning rate of 1e-2. Lion showed better performance and stability on both datasets at a low learning rate of 1e-5. These results highlight how crucial it is to choose learning rates and optimization techniques that are specific to the features of each dataset.

References :

  1. Wu, H., Q. Liu, and X. Liu, A review on deep learning approaches to image classification and object segmentation. Computers, Materials & Continua, 2019. 60(2).
  2. Deng, L., Deep learning: from speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing, 2016. 5: p. e1.
  3. Mostafa, H., V. Ramesh, and G. Cauwenberghs, Deep supervised learning using local errors. Frontiers in neuroscience, 2018. 12: p. 608.
  4. Abolghasemi, M., et al., How to effectively use machine learning models to predict the solutions for optimization problems: lessons from loss function. arXiv preprint arXiv:2105.06618, 2021.
  5. Ab Wahab, M.N., S. Nefti-Meziani, and A. Atyabi, A comprehensive review of swarm optimization algorithms. PloS one, 2015. 10(5): p. e0122827.
  6. Haji, S.H. and A.M. Abdulazeez, Comparison of optimization techniques based on gradient descent algorithm: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 2021. 18(4): p. 2715-2743.
  7. Ruder, S., An overview of gradient descent optimization algorithms. 2016.
  8. Mustapha, A., L. Mohamed, and K. Ali. Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. in Journal of physics: conference series. 2021. IOP Publishing.
  9. Zaheer, R. and H. Shaziya. A study of the optimization algorithms in deep learning. in 2019 third international conference on inventive systems and control (ICISC). 2019. IEEE.
  10. Chen, L., et al., Lion secretly solves constrained optimization: As lyapunov predicts. arXiv preprint arXiv:2310.05898, 2023.

In order to increase prediction accuracy, deep learning models must be trained by adjusting parameters to minimize a loss function. In supervised learning, the mapping between inputs and their right outputs is learned by training models using labeled input examples. In order to minimize mistakes, predictions are compared to actual outcomes, and optimization methods are used to adjust parameters. Until convergence is reached, these algorithms go through several cycles of iteration. Stochastic Gradient Descent (SGD), Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion are the seven optimization techniques that are evaluated in this study based on training accuracy, test accuracy, training loss, and sensitivity to learning rate. MNIST and CIFAR-10 were the two benchmark datasets used in the experiments. SGD with a learning rate of 0.5 had the best test accuracy of 99.14% and the highest training accuracy of 99.89% on MNIST. With test accuracy of 99.15% and 98%, respectively, at a learning rate of 1e-2, Momentum SGD and Adam likewise demonstrated strong performance. Optimizers like Yogi and Lion, on the other hand, performed competitively at lower learning rates but suffered at higher ones; at 1e-5, Lion's test accuracy was 98.69%. All optimizers displayed comparatively decreased accuracies for CIFAR-10, which was indicative of the dataset's increased complexity. Momentum SGD outperformed other optimizers including Adam, Yogi, and Lion, achieving the highest training accuracy of 98.90% and the best test accuracy of 72.94% at a learning rate of 1e-2. Lion showed better performance and stability on both datasets at a low learning rate of 1e-5. These results highlight how crucial it is to choose learning rates and optimization techniques that are specific to the features of each dataset.

CALL FOR PAPERS


Paper Submission Last Date
30 - November - 2025

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe