Generalized Zero-Shot Learning for Action Recognition Fusing Text and Image GANs

Research output: Contribution to journalArticlepeer-review

Abstract

Generalized Zero-Shot Action Recognition (GZSAR) is geared towards recognizing classes that the model has not been trained on, while still maintaining robust performance on the familiar, trained classes. This approach mitigates the need for an extensive amount of labeled training data and enhances the efficient utilization of available datasets. The main contribution of this paper is a novel approach for GZSAR that combines the power of two Generative Adversarial Networks (GANs). One GAN is responsible for generating embeddings from visual representations, while the other GAN focuses on generating embeddings from textual representations. These generated embeddings are fused, with the selection of the maximum value from each array that represents the embeddings, and this fused data is then utilized to train a GZSAR classifier in a supervised manner. This framework also incorporates a feature refinement component and an out-of-distribution detector to mitigate the domain shift problem between seen and unseen classes. In our experiments, notable improvements were observed. On the UCF101 benchmark dataset, we achieved a 7.43% increase in performance, rising from 50.93% (utilizing images and Word2Vec alone) to 54.71% with the implementation of two GANs. Additionally, on the HMDB51 dataset, we saw a 7.06% improvement, advancing from 36.11% using Text and Word2Vec to 38.66% with the dual-GAN approach. These results underscore the efficacy of our dual-GAN framework in enhancing GZSAR performance. The rest of the paper shows the main contributions to the field of GZSAR and highlights the potential and future lines of research in this exciting area.

Original languageEnglish
Pages (from-to)5188-5202
Number of pages15
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • Generalized zero-shot action recognition
  • generalised zero-shot learning
  • generative adversarial networks
  • human action recognition

Fingerprint

Dive into the research topics of 'Generalized Zero-Shot Learning for Action Recognition Fusing Text and Image GANs'. Together they form a unique fingerprint.

Cite this