Entity-grounded image captioning

Annika Lindh

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    An urgent limitation in current Image Captioning models is their tendency to produce generic captions that avoid the interesting detail which makes each image unique. To address this limitation, we propose an approach that enforces a stronger alignment between image regions and specific segments of text. The model architecture is composed of a visual region proposer, a region-order planner and a region-guided caption generator. The region-guided caption generator incorporates a novel information gate which allows visual and textual input of different frequencies and dimensionalities in a Recurrent Neural Network.
    Original languageEnglish
    DOIs
    Publication statusPublished - 2018
    EventECCV 2018 Workshop on Shortcomings in Vision and Language (SiVL) - Munich, Germany
    Duration: 8 Sep 20188 Sep 2018

    Conference

    ConferenceECCV 2018 Workshop on Shortcomings in Vision and Language (SiVL)
    Country/TerritoryGermany
    CityMunich
    Period8/09/188/09/18

    Keywords

    • Image Captioning
    • visual region proposer
    • region-order planner
    • region-guided caption generator
    • information gate
    • Recurrent Neural Network

    Fingerprint

    Dive into the research topics of 'Entity-grounded image captioning'. Together they form a unique fingerprint.

    Cite this