Generating diverse and meaningful captions: Unsupervised specificity optimization for image captioning

Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Image Captioning is a task that requires models to acquire a multimodal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online (https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captioning).

Original languageEnglish
Title of host publicationArtificial Neural Networks and Machine Learning – ICANN 2018 - 27th International Conference on Artificial Neural Networks, 2018, Proceedings
EditorsVera Kurkova, Barbara Hammer, Yannis Manolopoulos, Lazaros Iliadis, Ilias Maglogiannis
PublisherSpringer Verlag
Pages176-187
Number of pages12
ISBN (Print)9783030014179
DOIs
Publication statusPublished - 2018
Event27th International Conference on Artificial Neural Networks, ICANN 2018 - Rhodes, Greece
Duration: 4 Oct 20187 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11139 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Artificial Neural Networks, ICANN 2018
Country/TerritoryGreece
CityRhodes
Period4/10/187/10/18

Keywords

  • Computer vision
  • Contrastive learning
  • Deep learning
  • Diversity
  • Image captioning
  • Image retrieval
  • MS COCO
  • Machine learning
  • Multimodal training
  • Natural language generation
  • Natural language processing
  • Neural networks
  • Specificity

Fingerprint

Dive into the research topics of 'Generating diverse and meaningful captions: Unsupervised specificity optimization for image captioning'. Together they form a unique fingerprint.

Cite this