Authors :
Ibrahim Khalil Shehada; Rasha Ragheb Atallah; Ashraf Yunis Maghari
Volume/Issue :
Volume 10 - 2025, Issue 8 - August
Google Scholar :
https://tinyurl.com/yc4jf52j
Scribd :
https://tinyurl.com/yrtwtvus
DOI :
https://doi.org/10.38124/ijisrt/25aug007
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
In order to increase prediction accuracy, deep learning models must be trained by adjusting parameters to
minimize a loss function. In supervised learning, the mapping between inputs and their right outputs is learned by training
models using labeled input examples. In order to minimize mistakes, predictions are compared to actual outcomes, and
optimization methods are used to adjust parameters. Until convergence is reached, these algorithms go through several
cycles of iteration. Stochastic Gradient Descent (SGD), Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion are
the seven optimization techniques that are evaluated in this study based on training accuracy, test accuracy, training loss,
and sensitivity to learning rate. MNIST and CIFAR-10 were the two benchmark datasets used in the experiments. SGD with
a learning rate of 0.5 had the best test accuracy of 99.14% and the highest training accuracy of 99.89% on MNIST. With
test accuracy of 99.15% and 98%, respectively, at a learning rate of 1e-2, Momentum SGD and Adam likewise demonstrated
strong performance. Optimizers like Yogi and Lion, on the other hand, performed competitively at lower learning rates but
suffered at higher ones; at 1e-5, Lion's test accuracy was 98.69%. All optimizers displayed comparatively decreased
accuracies for CIFAR-10, which was indicative of the dataset's increased complexity. Momentum SGD outperformed other
optimizers including Adam, Yogi, and Lion, achieving the highest training accuracy of 98.90% and the best test accuracy of
72.94% at a learning rate of 1e-2. Lion showed better performance and stability on both datasets at a low learning rate of
1e-5. These results highlight how crucial it is to choose learning rates and optimization techniques that are specific to the
features of each dataset.
References :
- Wu, H., Q. Liu, and X. Liu, A review on deep learning approaches to image classification and object segmentation. Computers, Materials & Continua, 2019. 60(2).
- Deng, L., Deep learning: from speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing, 2016. 5: p. e1.
- Mostafa, H., V. Ramesh, and G. Cauwenberghs, Deep supervised learning using local errors. Frontiers in neuroscience, 2018. 12: p. 608.
- Abolghasemi, M., et al., How to effectively use machine learning models to predict the solutions for optimization problems: lessons from loss function. arXiv preprint arXiv:2105.06618, 2021.
- Ab Wahab, M.N., S. Nefti-Meziani, and A. Atyabi, A comprehensive review of swarm optimization algorithms. PloS one, 2015. 10(5): p. e0122827.
- Haji, S.H. and A.M. Abdulazeez, Comparison of optimization techniques based on gradient descent algorithm: A review. PalArch’s Journal of Archaeology of Egypt/Egyptology, 2021. 18(4): p. 2715-2743.
- Ruder, S., An overview of gradient descent optimization algorithms. 2016.
- Mustapha, A., L. Mohamed, and K. Ali. Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. in Journal of physics: conference series. 2021. IOP Publishing.
- Zaheer, R. and H. Shaziya. A study of the optimization algorithms in deep learning. in 2019 third international conference on inventive systems and control (ICISC). 2019. IEEE.
- Chen, L., et al., Lion secretly solves constrained optimization: As lyapunov predicts. arXiv preprint arXiv:2310.05898, 2023.
In order to increase prediction accuracy, deep learning models must be trained by adjusting parameters to
minimize a loss function. In supervised learning, the mapping between inputs and their right outputs is learned by training
models using labeled input examples. In order to minimize mistakes, predictions are compared to actual outcomes, and
optimization methods are used to adjust parameters. Until convergence is reached, these algorithms go through several
cycles of iteration. Stochastic Gradient Descent (SGD), Momentum SGD, RMSProp, AMSGrad, Adam, Yogi, and Lion are
the seven optimization techniques that are evaluated in this study based on training accuracy, test accuracy, training loss,
and sensitivity to learning rate. MNIST and CIFAR-10 were the two benchmark datasets used in the experiments. SGD with
a learning rate of 0.5 had the best test accuracy of 99.14% and the highest training accuracy of 99.89% on MNIST. With
test accuracy of 99.15% and 98%, respectively, at a learning rate of 1e-2, Momentum SGD and Adam likewise demonstrated
strong performance. Optimizers like Yogi and Lion, on the other hand, performed competitively at lower learning rates but
suffered at higher ones; at 1e-5, Lion's test accuracy was 98.69%. All optimizers displayed comparatively decreased
accuracies for CIFAR-10, which was indicative of the dataset's increased complexity. Momentum SGD outperformed other
optimizers including Adam, Yogi, and Lion, achieving the highest training accuracy of 98.90% and the best test accuracy of
72.94% at a learning rate of 1e-2. Lion showed better performance and stability on both datasets at a low learning rate of
1e-5. These results highlight how crucial it is to choose learning rates and optimization techniques that are specific to the
features of each dataset.