Exploring deep learning approaches for image captioning to mimic human understanding
Maheen Islam, Mahedi Hassan Ratul, Rezaul Haque, Sazzad Hossain Rony, Azharul Huq Asif, Tanni Mittra, Md Miskat Hossain, Mahamudul Hasan
Abstract
Image captioning has emerged as a vital research area in computer vision, aiming to enhance how humans interact with visual content. While progress has been made, challenges like improving caption diversity and accuracy remain. This study proposes transfer learning models and RNN algorithms trained on the microsoft common objects in context (MS COCO) dataset to improve image captioning quality. The models combine image and text features, utilizing ResNet50, VGG16, and InceptionV3 with LSTM, and BiLSTM. Performance is measured using metrics such as BLEU, ROUGE, and METEOR for greedy and beam search. The InceptionV3+BiLSTM model outperformed others, achieving a BLEUscore of over 60%, a METEORscore of 28.6%, and a ROUGEscore of 57.2%. This research contributes to building a simple yet effective image captioning model, providing accurate descriptions with human-like understanding. The error was analyzed to improve results while discussing ongoing research aimed at enhancing the diversity, fluency, and accuracy of generated captions, with significant implications for improving the accessibility and searchability of visual media and informing future research in this area.
Keywords
Caption generation; Context dataset; Deep learning; Image captioning; Image encoding; Microsoft common objects in context
DOI:
https://doi.org/10.11591/eei.v14i4.8885
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191 , e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .