Authors :
Darren Kevin T. Nguemdjom; Alidor M. Mbayandjambe; Grevi B. Nkwimi; Fiston Oshasha; Célestin Muluba; Héritier I. Mbengandji; Ibsen G. BAZIE; Raphael Kpoghomou; Alain M. Kuyunsa
Volume/Issue :
Volume 10 - 2025, Issue 4 - April
Google Scholar :
https://tinyurl.com/5da2x39a
Scribd :
https://tinyurl.com/3bxf42yd
DOI :
https://doi.org/10.38124/ijisrt/25apr2118
Google Scholar
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 15 to 20 days to display the article.
Abstract :
This study evaluates the effectiveness of integrating multi-scale attention mechanisms, specifically the Bottleneck
Attention Module (BAM), into deep learning architectures such as ResNet18 and SqueezeNet, using the CIFAR-10 dataset.
BAM combines spatial and channel attention, enabling the simultaneous capture of local and global dependencies, thereby
enhancing the models’ ability to handle visual disruptions and adversarial attacks. A comparison with existing
mechanisms, such as ECA-Net and CBAM, demonstrates that BAM outperforms them through its parallel approach,
which efficiently optimizes spatial and channel dimensions while maintaining computational efficiency.Potential
applications include critical domains such as medical imaging and surveillance, where precision and robustness are
essential, particularly in dynamic environments or under adversarial constraints. The study also highlights avenues for
integrating BAM with emerging architectures like Transformers to combine the advantages of long-range relationships
and multi-scale dependencies. Experimental results confirm BAM’s effectiveness: on clean data, ResNet18’s accuracy
improves from 74.83% to 90.58%, and SqueezeNet from 75.50% to 86.70%. Under adversarial conditions, BAM enhances
ResNet18’s robustness from 59.2% to 70.4% under PGD attacks, while the hybrid model achieves a maximum accuracy of
75.8%. Activation analysis reveals that BAM strengthens model interpretability by focusing attention on regions of
interest, reducing false activations and improving overall reliability. These findings position BAM as an ideal solution for
modern embedded vision systems that require an optimal balance between performance, robustness, and efficiency.
Keywords :
Robustness, Adversarial Perturbations, Multi-Scale Attention, BAM, ResNet18, SqueezeNet.
References :
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539
- Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://arxiv.org/abs/1409.1556.
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1-9). https://arxiv.org/abs/1409.4842
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770-778). https://arxiv.org/abs/1512.03385
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS) (Vol. 25, pp. 1097-1105). https://dl.acm.org/doi/10.1145/3065386
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770-778). https://arxiv.org/abs/1512.03385
- Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., & Sun, Y. (2021). ResNet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2106.02204. https://arxiv.org/abs/2106.02204
- Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4700-4708). https://arxiv.org/abs/1608.06993
- Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML) (pp. 6105-6114). https://arxiv.org/abs/1905.11946
- Wightman, R., Touvron, H., & Jégou, S. (2021). ResNet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2102.07624. https://arxiv.org/abs/2102.07624
- Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv preprint arXiv:1602.07360. https://arxiv.org/abs/1602.07360
- Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. https://arxiv.org/abs/1704.04861
- Han, S., Pool, J., Tran, J., & Dally, W. J. (2015). Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems (NeurIPS) (pp. 1135-1143). https://arxiv.org/abs/1510.00149
- Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1251-1258). https://arxiv.org/abs/1704.04861
- Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6848-6856). https://arxiv.org/abs/1801.04381
- Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19). https://arxiv.org/abs/1807.06514
- Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7132-7141). https://arxiv.org/abs/1709.01507
- Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., & Sun, Y. (2021). ResNet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2106.02204. https://arxiv.org/abs/2106.02204
- Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020). ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 11534-11542). https://arxiv.org/abs/2004.13621
- Li, X., Wang, W., Hu, X., & Yang, J. (2022). Selective Kernel Networks: Adaptive kernel selection for convolutional neural networks. arXiv preprint arXiv:2201.05639. https://arxiv.org/abs/2201.05639
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199
- Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929-1958. Retrieved from https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
- Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Retrieved from https://arxiv.org/abs/1502.03167
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Retrieved from https://arxiv.org/abs/1706.06083
- Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). Mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412. Retrieved from https://arxiv.org/abs/1710.09412
- Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2019). AutoAugment: Learning augmentation strategies from data. arXiv preprint arXiv:1805.09501. Retrieved from https://arxiv.org/abs/1805.09501
- Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1-48. Retrieved from https://link.springer.com/article/10.1186/s40537-019-0192-0
- Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), 3–19. Retrieved from https://arxiv.org/abs/1807.06521
- Zhang, H., Dai, Y., & Wang, L. (2021). Multi-scale attention networks for robust feature extraction. arXiv preprint arXiv:2108.12250. Retrieved from https://arxiv.org/abs/2108.12250
- Li, Q., Jiang, H., & Zhao, X. (2022). Enhanced feature learning with attention mechanisms in deep networks. arXiv preprint arXiv:2202.05296. Retrieved from https://arxiv.org/abs/2202.05296
- Chen, Y., Liu, Z., & Xiao, H. (2020). Improving robustness with bottleneck attention modules. arXiv preprint arXiv:2001.06487. Retrieved from https://arxiv.org/abs/2001.06487
- Guo, X., Zhou, Y., & Zhao, H. (2021). BAM-based networks for semantic segmentation in noisy environments. arXiv preprint arXiv:2112.12183. Retrieved from https://arxiv.org/abs/2112.12183
- Jiang, R., Wang, F., & Zhao, X. (2022). Adversarially robust networks with enhanced attention. arXiv preprint arXiv:2201.11089. Retrieved from https://arxiv.org/abs/2201.11089
- Zhao, L., Huang, W., & Lin, F. (2023). Attention-driven architectures for reliable computer vision. arXiv preprint arXiv:2303.12827. Retrieved from https://arxiv.org/abs/2303.12827
- Woo, S., Park, J., Lee, J.-Y., & Kweon, I. S. (2018). BAM: Bottleneck Attention Module. arXiv preprint arXiv:1807.06514. Disponible à : https://arxiv.org/abs/1807.06514
- Zhang, Y., Li, P., & Guo, L. (2021). SCORN: Sinter Composition Optimization with Regressive Convolutional Neural Network. IEEE Transactions on Industrial Informatics. Disponible à : https://youshanzhang.github.io/publications/
- Li, X., Li, X., Zhang, L., Cheng, G., Shi, J., Lin, Z., Tan, S., & Tong, Y. (2022). Improving Semantic Segmentation via Decoupled Body and Edge Supervision. European Conference on Computer Vision (ECCV). Disponible à : https://scholar.google.com/citations?hl=en&user=-wOTCE8AAAAJ
- Guo, Y., Zhang, L., & Tao, D. (2021). Attention Distillation: Self-supervised Vision Transformer Students Need More Guidance. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Disponible à : https://arxiv.org/pdf/2210.00944
- Hu, J., Shen, L., & Sun, G. (2019). Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8), 2011–2023. DOI: https://doi.org/10.1109/TPAMI.2019.2913372
- Zhao, Y., Zhang, L., & Tao, D. (2022). Attention Distillation: Self-supervised Vision Transformer Students Need More Guidance. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Available at: https://arxiv.org/pdf/2210.00944
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., & Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv preprint arXiv:2103.14030. Available at: https://arxiv.org/abs/2103.14030
- Chen, Z., Li, Z., Song, L., Chen, L., & Yu, J. (2021). NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics. Available at: https://scholar.google.com/citations?hl=en&user=4MIbSrAAAAA
- Jordan, K. (2024). 94% on CIFAR-10 in 3.29 Seconds on a Single GPU. arXiv preprint arXiv:2404.00498. Retrieved from https://arxiv.org/abs/2404.00498
- Li, X., Zhou, Y., & Wang, X. (2023). A2-Aug: Adaptive Automated Data Augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 123-132). Retrieved from CVPR
- Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 702-703). Retrieved from CVPR
- Wang et al., 2020 (Efficient Channel Attention - ECA-Net)Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., & Hu, Q. (2020). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks.
- Li et al., 2019 (Selective Kernel Networks - SKNet) https://arxiv.org/abs/1903.06586
- Dosovitskiy et al., 2020 (Vision Transformers - ViT) https://arxiv.org/abs/2010.11929
- Bykovets, E., Metz, Y., El-Assady, M., Keim, D. A., & Buhmann, J. M. (2022). BARReL: Bottleneck Attention for Adversarial Robustness in Vision-Based Reinforcement Learning. arXiv preprint arXiv:2208.10481. Disponible sur : https://arxiv.org/abs/2208.10481.
This study evaluates the effectiveness of integrating multi-scale attention mechanisms, specifically the Bottleneck
Attention Module (BAM), into deep learning architectures such as ResNet18 and SqueezeNet, using the CIFAR-10 dataset.
BAM combines spatial and channel attention, enabling the simultaneous capture of local and global dependencies, thereby
enhancing the models’ ability to handle visual disruptions and adversarial attacks. A comparison with existing
mechanisms, such as ECA-Net and CBAM, demonstrates that BAM outperforms them through its parallel approach,
which efficiently optimizes spatial and channel dimensions while maintaining computational efficiency.Potential
applications include critical domains such as medical imaging and surveillance, where precision and robustness are
essential, particularly in dynamic environments or under adversarial constraints. The study also highlights avenues for
integrating BAM with emerging architectures like Transformers to combine the advantages of long-range relationships
and multi-scale dependencies. Experimental results confirm BAM’s effectiveness: on clean data, ResNet18’s accuracy
improves from 74.83% to 90.58%, and SqueezeNet from 75.50% to 86.70%. Under adversarial conditions, BAM enhances
ResNet18’s robustness from 59.2% to 70.4% under PGD attacks, while the hybrid model achieves a maximum accuracy of
75.8%. Activation analysis reveals that BAM strengthens model interpretability by focusing attention on regions of
interest, reducing false activations and improving overall reliability. These findings position BAM as an ideal solution for
modern embedded vision systems that require an optimal balance between performance, robustness, and efficiency.
Keywords :
Robustness, Adversarial Perturbations, Multi-Scale Attention, BAM, ResNet18, SqueezeNet.