Authors :
Mehwish Mirza; Muhammad Talha Siddiqui
Volume/Issue :
Volume 9 - 2024, Issue 10 - October
Google Scholar :
https://tinyurl.com/mspyyftd
Scribd :
https://tinyurl.com/33y8hkj3
DOI :
https://doi.org/10.38124/ijisrt/IJISRT24OCT678
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Image captioning which links computer vision
with NATURAL LANGUAGE PROCESSING is critical in
providing descriptions for the image. The proposed
solution in this research is a hierarchical attention model
which includes use of CNN features on images and LSTM
networks with attention mechanisms for generating
captions. By utilizing both object level and image level
features, our method enhances the quality and relevance of
captions, enhancing the variability of the automated image
description.
Keywords :
Image Captioning, Deep Learning, Artificial Intelligence, Natural language Processing.
References :
- Al-Malla, M.A., Jafar, A. & Ghneim, N. (2020). Image captioning model using attention and object features to mimic human image understanding. IEEE.
- Aneja, J., Deshpande, A. & Schwing, A.G. (2017). Convolutional image captioning. IEEE.
- Ayoub, S., Reegu, F.A. & Turaev, S. (2022). Generating image captions using bahdanau attention mechanism and transfer learning. In Symmetry 2022, 14, 2681.
- Bai, T., Zhou, S., Pang, Y., Luo, J., Wang, H. & Du, Y. (2023). An image caption model based on attention mechanism and deep reinforcement learning. IEEE Conference.
- Cao, P., Yang, Z., Sun, L., Liang, Y., Yang, M.Q. & Guan, R. (2019). Image captioning with bidirectional semantic attention-based guiding of long short-term memory. Neural Processing Letters, 50, 103–119.
- Chaudhri, S., Mithal, V., Polatkan, G. & Ramanath, R. (2021). An attentive survey of attention models. IEEE.
- Chen, J., Dong, W. & Li, M. (2021). Image caption generator based on deep neural networks. IEEE.
- Galassi, A., Lippi, M. & Torroni, P. (2021). Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 32.
- Gaurav & Mathur, P. (2021). A survey on various deep learning models for automatic image captioning. ICMAI.
- Hendricks, L.A., Venugopalan, S. & Rohrbach, M. (2016). Deep compo_sitional captioning: Describing novel object categories without paired training data. IEEE.
- Huang, L., Wang, W., Chen, J. & Wei, X.Y. (2019). Attention on attention for image captioning. In IEEE.
- Jandial, S., Badjatiya, P., Chawla, P. & Krishnamurthy, B. (2022). Sac: Semantic attention composition for text-conditioned image retrieval. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
- Khaled, R., R., T.T. & Arabnia, H.R. (2020). Automatic image and video caption generation with deep learning: A concise review and algorithmic over lap. IEEE.
- Khan, R., Islam, M.S., Kanwal, K., Iqbal, M., Hossain, M.I. & Ye, Z. (2022). A deep neural framework for image caption generation using gru-based attention mechanism. IEEE.
- Lew, M.S., Liu, Y., Guo, Y. & Bakker, E.M. (2017a). Learning a recurrent residual fusion network for multimodal matching. In IEEE.
- Lew, Y.L., Guo, Y., Bakker, E.M. & Lew, M.S. (2017b). Learning a recurrent residual fusion network for multimodal matching. IEEE.
- Mathur, A. (2022). Image captioning system using recurrent neural network lstm. International Journal of Engineering Research and Technology (IJERT).
- Mundargi, M.S. & Mohanty, M.H. (2020). Image captioning using attention mechanism with resnet, vgg and inception models. International Research Journal of Engineering and Technology (IRJET).
- Parameshwaran, A.P. (2020). Deep architectures for visual recognition and description. Scholarworks.
- Pedersoli, M., Lucas, T., Schmid, C. & Verbeek, J. (2017). Areas of attention for image captioning. In IEEE International Conference on Computer Vision (ICCV).
- Rajendra, A., Rajendra, R., Mengshoel, O.J., Zeng, M. & Haider, M. (2018). Captioning with language-based attention. In IEEE 5th International Conference on Data Science and Advanced Analytics.
- Raut, R., Patil, S., Borkar, P. & Zore, P. (2023). Image captioning using resnet rs and attention mechanism. International Journal of Intelligent Systems and Applications in Engineering.
- Shukla, S.K., Dubey, S., Pandey, A.K., Mishra, V. & Awasthi, M. (2021). Image caption generator using neural networks. International Journal of Scientific Research in Computer Science, Engineering and Information Technology.
- Soh, M. (2016). Learning cnn-lstm architectures for image caption generation. In IEEE.
- Sonntag, D., Biswas, R. & Barz, M. (2020). Towards explanatory inter_active image captioning using top-down and bottom-up features, beam search and re-ranking. In KI - K¨unstliche Intelligenz.
- Sun, J. & Lapuschkin, S. (2018). Explain and improve: Lrp-inference fine tuning for image captioning models. IEEE.
- Vinyals, O. (2015). Show and tell: A neural image caption generator. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Yan, S., Xie, Y., Wu, F., Smith, J.S., Lu, W. & Zhang, B. (2019). Image captioning via a hierarchical attention mechanism and policy gradient optimization. IEEE.
- Yao, T., Pan, Y., Li, Y., Qiu, Z. & Mei, T. (2017). Boosting image cap_tioning with attributes. IEEE.
- You, Q., Jin, H., Wang, Z., Fang, C. & Luo, J. (2016). Image captioning with semantic attention. IEEE.
Image captioning which links computer vision
with NATURAL LANGUAGE PROCESSING is critical in
providing descriptions for the image. The proposed
solution in this research is a hierarchical attention model
which includes use of CNN features on images and LSTM
networks with attention mechanisms for generating
captions. By utilizing both object level and image level
features, our method enhances the quality and relevance of
captions, enhancing the variability of the automated image
description.
Keywords :
Image Captioning, Deep Learning, Artificial Intelligence, Natural language Processing.