Skip to main navigation Skip to search Skip to main content

On The Automatic Image Captioning Task In Italian: A Human-Centric Approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic Image Captioning (AIC) combines two distinct machine learning disciplines, Computer Vision (CV) and Natural Language Processing (NLP), forming a challenging research landscape. Much research has been carried out for AIC in the English language facilitated by the availability of English language datasets. However, this is not true for other languages. The aim of this work is to investigate existing AIC models' compatibility for the Italian language. The scope of this research will be aimed to reduced complexity, improved efficiency, and readily trainable models on smaller datasets while maintaining an acceptable level of caption quality. The popular Show and Tell encoder-decoder model was selected and trained on subsets of the MSCOCO-it dataset of 10k, 20k, and 30k with and without human validated captions, and then evaluated with BLEU-3/4, METEOR, ROUGE-L, and CIDEr. The model was also tested on unseen images. The evaluation results were not promising and were below 50%. We intend to investigate multimodal augmentation techniques for image-caption datasets. To enhance interpretability, explainable AI (XAI) methods such as attention visualization and saliency mapping are to be employed to reveal which image regions and linguistic features influenced caption generation. Furthermore, a gender-aware evaluation is planned to be researched and introduced, assessing whether generated captions reproduce gender bias present in training data. This dual focus on explainability and inclusivity strengthens trust in AIC systems while supporting fairer and more transparent applications.

Original languageEnglish
Title of host publicationHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PublisherAssociation for Computing Machinery (ACM)
Pages75-78
Number of pages4
ISBN (Electronic)9798400721533
DOIs
Publication statusPublished - 16 Feb 2026
Event3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 - Kildare, Ireland
Duration: 21 Jan 202622 Jan 2026

Publication series

NameHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice

Conference

Conference3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
Country/TerritoryIreland
CityKildare
Period21/01/2622/01/26

Keywords

  • explainable AI
  • gender bias
  • image captioning

Fingerprint

Dive into the research topics of 'On The Automatic Image Captioning Task In Italian: A Human-Centric Approach'. Together they form a unique fingerprint.

Cite this