Image Caption Generation Using Deep Learning Algorithm

Main Article Content

Shan-E-Fatima
Kratika Gupta
Deepti Goyal
Suman Kumar Mishra

Abstract

This study investigates the effectiveness of an image captioning model utilizing VGG16 and LSTM architectures on the Flickr8K dataset. Through meticulous experimentation and evaluation, valuable insights into the model's capabilities and limitations in generating descriptive captions for images were gained. The findings contribute to the broader understanding of image captioning techniques and offer guidance for future advancements in the field. The exploration of VGG16 and LSTM architecture involved data preprocessing, model training, and evaluation. The Flickr8K dataset, comprising 8,000 images paired with textual descriptions, served as the foundation. Data preprocessing, feature extraction using VGG16, and LSTM training were conducted. Optimization of model parameters and hyperparameters was performed to achieve optimal performance. Evaluation metrics including BLEU score, Semantic Similarity score, and ROUGE scores were utilized. While moderate overlap with reference captions was observed according to the BLEU score, the model demonstrated a high degree of semantic similarity. However, challenges in maintaining coherence and capturing higher-order linguistic structures were revealed by the analysis of ROUGE scores. Implications of this research extend to domains such as computer vision, natural language processing, and human-computer interaction. By bridging the semantic gap between visual content and textual descriptions, image captioning models can enhance accessibility, improve image understanding, and facilitate human-machine communication. Despite promising performance in capturing semantic content, opportunities for improvement exist, including refining model architecture, integrating attention mechanisms, and leveraging larger datasets. Continued innovation in image captioning promises advanced systems with widespread applications across industries and disciplines.

Downloads

Download data is not yet available.

Article Details

How to Cite
Shan-E-Fatima, Kratika Gupta, Deepti Goyal, & Suman Kumar Mishra. (2024). Image Caption Generation Using Deep Learning Algorithm. Educational Administration: Theory and Practice, 30(5), 8118–8128. https://doi.org/10.53555/kuey.v30i5.4311
Section
Articles
Author Biographies

Shan-E-Fatima

Assistant Professor, Khwaja Moinuddin Chishti language University, Lucknow

Kratika Gupta

Research Scholar, Lingaya's Vidyapeeth, Faridabad, Haryana

Deepti Goyal

Research Scholar, Lingaya's Vidyapeeth, Faridabad, Haryana

Suman Kumar Mishra

Assistant Professor, Khwaja Moinuddin Chishti language University, Lucknow