TY - GEN
T1 - On The Automatic Image Captioning Task In Italian
T2 - 3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
AU - De Amicis, Valentina
AU - Jaiswal, Rajesh
AU - Perez-Tellez, Fernando
N1 - Publisher Copyright:
© 2026 Copyright held by the owner/author(s).
PY - 2026/2/16
Y1 - 2026/2/16
N2 - Automatic Image Captioning (AIC) combines two distinct machine learning disciplines, Computer Vision (CV) and Natural Language Processing (NLP), forming a challenging research landscape. Much research has been carried out for AIC in the English language facilitated by the availability of English language datasets. However, this is not true for other languages. The aim of this work is to investigate existing AIC models' compatibility for the Italian language. The scope of this research will be aimed to reduced complexity, improved efficiency, and readily trainable models on smaller datasets while maintaining an acceptable level of caption quality. The popular Show and Tell encoder-decoder model was selected and trained on subsets of the MSCOCO-it dataset of 10k, 20k, and 30k with and without human validated captions, and then evaluated with BLEU-3/4, METEOR, ROUGE-L, and CIDEr. The model was also tested on unseen images. The evaluation results were not promising and were below 50%. We intend to investigate multimodal augmentation techniques for image-caption datasets. To enhance interpretability, explainable AI (XAI) methods such as attention visualization and saliency mapping are to be employed to reveal which image regions and linguistic features influenced caption generation. Furthermore, a gender-aware evaluation is planned to be researched and introduced, assessing whether generated captions reproduce gender bias present in training data. This dual focus on explainability and inclusivity strengthens trust in AIC systems while supporting fairer and more transparent applications.
AB - Automatic Image Captioning (AIC) combines two distinct machine learning disciplines, Computer Vision (CV) and Natural Language Processing (NLP), forming a challenging research landscape. Much research has been carried out for AIC in the English language facilitated by the availability of English language datasets. However, this is not true for other languages. The aim of this work is to investigate existing AIC models' compatibility for the Italian language. The scope of this research will be aimed to reduced complexity, improved efficiency, and readily trainable models on smaller datasets while maintaining an acceptable level of caption quality. The popular Show and Tell encoder-decoder model was selected and trained on subsets of the MSCOCO-it dataset of 10k, 20k, and 30k with and without human validated captions, and then evaluated with BLEU-3/4, METEOR, ROUGE-L, and CIDEr. The model was also tested on unseen images. The evaluation results were not promising and were below 50%. We intend to investigate multimodal augmentation techniques for image-caption datasets. To enhance interpretability, explainable AI (XAI) methods such as attention visualization and saliency mapping are to be employed to reveal which image regions and linguistic features influenced caption generation. Furthermore, a gender-aware evaluation is planned to be researched and introduced, assessing whether generated captions reproduce gender bias present in training data. This dual focus on explainability and inclusivity strengthens trust in AIC systems while supporting fairer and more transparent applications.
KW - explainable AI
KW - gender bias
KW - image captioning
UR - https://www.scopus.com/pages/publications/105031772422
U2 - 10.1145/3777490.3777491
DO - 10.1145/3777490.3777491
M3 - Conference contribution
AN - SCOPUS:105031772422
T3 - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
SP - 75
EP - 78
BT - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PB - Association for Computing Machinery (ACM)
Y2 - 21 January 2026 through 22 January 2026
ER -