Image Captioning Using R-CNN & LSTM Deep Learning Model


Authors : Aditya Kumar Yadav; Prakash.J

Volume/Issue : Volume 6 - 2021, Issue 5 - May


Google Scholar : http://bitly.ws/9nMw

Scribd : https://bit.ly/3iwuVnJ


Abstract : Image Captioning is the process of creating a text description of an image. It uses both Natural Language Processing (NLP) and Computer Vision to generate the captions. The image captioning task is done by combining the detection process when the descriptions consist of a single word like cat, skateboard, etc. and Image Captioning when one predicted region covers the full image, for example cat riding a skateboard. To address the localization and description task together, we propose a Fully Convolution Localization Network that processes a picture with a single forward pass which can be consistently trained in a single round of optimization. To process an image, first the input image is processed using CNN. Then Localization Layer proposes regions and includes a region detection network adopted from faster R-CNN and captioning network. The model directly combines the faster R-CNN framework for region detection and long short-term memory (LSTM) for captioning

Keywords : Image Caption, Recurrent Neural Network, Long short-term memory, Convolution Neural Network, Faster R-CNN, Natural Language Processing

Image Captioning is the process of creating a text description of an image. It uses both Natural Language Processing (NLP) and Computer Vision to generate the captions. The image captioning task is done by combining the detection process when the descriptions consist of a single word like cat, skateboard, etc. and Image Captioning when one predicted region covers the full image, for example cat riding a skateboard. To address the localization and description task together, we propose a Fully Convolution Localization Network that processes a picture with a single forward pass which can be consistently trained in a single round of optimization. To process an image, first the input image is processed using CNN. Then Localization Layer proposes regions and includes a region detection network adopted from faster R-CNN and captioning network. The model directly combines the faster R-CNN framework for region detection and long short-term memory (LSTM) for captioning

Keywords : Image Caption, Recurrent Neural Network, Long short-term memory, Convolution Neural Network, Faster R-CNN, Natural Language Processing

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe