Exploring deep learning approaches for image captioning to mimic human understanding

Maheen Islam, Mahedi Hassan Ratul, Rezaul Haque, Sazzad Hossain Rony, Azharul Huq Asif, Tanni Mittra, Md Miskat Hossain, Mahamudul Hasan

Abstract


Image captioning has emerged as a vital research area in computer vision, aiming to enhance how humans interact with visual content. While progress has been made, challenges like improving caption diversity and accuracy remain. This study proposes transfer learning models and RNN algorithms trained on the microsoft common objects in context (MS COCO) dataset to improve image captioning quality. The models combine image and text features, utilizing ResNet50, VGG16, and InceptionV3 with LSTM, and BiLSTM. Performance is measured using metrics such as BLEU, ROUGE, and METEOR for greedy and beam search. The InceptionV3+BiLSTM model outperformed others, achieving a BLEUscore of over 60%, a METEORscore of 28.6%, and a ROUGEscore of 57.2%. This research contributes to building a simple yet effective image captioning model, providing accurate descriptions with human-like understanding. The error was analyzed to improve results while discussing ongoing research aimed at enhancing the diversity, fluency, and accuracy of generated captions, with significant implications for improving the accessibility and searchability of visual media and informing future research in this area.

Keywords


Caption generation; Context dataset; Deep learning; Image captioning; Image encoding; Microsoft common objects in context

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v14i4.8885

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).