Image Caption Generation Using Deep Learning Algorithm

Shan-E-Fatima; Kratika Gupta; Deepti Goyal; Suman Kumar Mishra

doi:10.53555/kuey.v30i5.4311

pdf

Published: May 23, 2024

DOI: https://doi.org/10.53555/kuey.v30i5.4311

Keywords:

Image Captioning, Deep Learning, VGG16, LSTM, Flickr8K Dataset, Evaluation Metrics, Semantic Gap, Human-Computer Interaction.

Shan-E-Fatima

Kratika Gupta

Deepti Goyal

Suman Kumar Mishra

Abstract

This study investigates the effectiveness of an image captioning model utilizing VGG16 and LSTM architectures on the Flickr8K dataset. Through meticulous experimentation and evaluation, valuable insights into the model's capabilities and limitations in generating descriptive captions for images were gained. The findings contribute to the broader understanding of image captioning techniques and offer guidance for future advancements in the field. The exploration of VGG16 and LSTM architecture involved data preprocessing, model training, and evaluation. The Flickr8K dataset, comprising 8,000 images paired with textual descriptions, served as the foundation. Data preprocessing, feature extraction using VGG16, and LSTM training were conducted. Optimization of model parameters and hyperparameters was performed to achieve optimal performance. Evaluation metrics including BLEU score, Semantic Similarity score, and ROUGE scores were utilized. While moderate overlap with reference captions was observed according to the BLEU score, the model demonstrated a high degree of semantic similarity. However, challenges in maintaining coherence and capturing higher-order linguistic structures were revealed by the analysis of ROUGE scores. Implications of this research extend to domains such as computer vision, natural language processing, and human-computer interaction. By bridging the semantic gap between visual content and textual descriptions, image captioning models can enhance accessibility, improve image understanding, and facilitate human-machine communication. Despite promising performance in capturing semantic content, opportunities for improvement exist, including refining model architecture, integrating attention mechanisms, and leveraging larger datasets. Continued innovation in image captioning promises advanced systems with widespread applications across industries and disciplines.

Downloads

Download data is not yet available.

How to Cite

Shan-E-Fatima, Kratika Gupta, Deepti Goyal, & Suman Kumar Mishra. (2024). Image Caption Generation Using Deep Learning Algorithm. Educational Administration: Theory and Practice, 30(5), 8118–8128. https://doi.org/10.53555/kuey.v30i5.4311

Issue

Vol. 30 No. 5 (2024)

Section

Articles

Author Biographies

Shan-E-Fatima

Assistant Professor, Khwaja Moinuddin Chishti language University, Lucknow

Kratika Gupta

Research Scholar, Lingaya's Vidyapeeth, Faridabad, Haryana

Deepti Goyal

Research Scholar, Lingaya's Vidyapeeth, Faridabad, Haryana

Suman Kumar Mishra

Assistant Professor, Khwaja Moinuddin Chishti language University, Lucknow

Image Caption Generation Using Deep Learning Algorithm

Abstract

Downloads

Shan-E-Fatima

Kratika Gupta

Deepti Goyal

Suman Kumar Mishra

Quick Links

Quick Links

Contact

Publisher

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Shan-E-Fatima

Kratika Gupta

Deepti Goyal

Suman Kumar Mishra