Attention driven reference resolution in multimodal contexts

J. D. Kelleher

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous.

Original languageEnglish
Pages (from-to)21-35
Number of pages15
JournalArtificial Intelligence Review
Volume25
Issue number1-2
DOIs
Publication statusPublished - Apr 2006

Keywords

  • Attention
  • Natural language processing
  • Reference resolution
  • Salience
  • Situated dialog
  • Vision and language

Fingerprint

Dive into the research topics of 'Attention driven reference resolution in multimodal contexts'. Together they form a unique fingerprint.

Cite this