Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies

  • Sunder Ali Khowaja
  • , Seok Lyong Lee

Research output: Contribution to journalArticlepeer-review

Abstract

Human action recognition from skeleton data has drawn a lot of attention from researchers due to the availability of thousands of real videos with many challenges. Existing works attempted to model the spatial characteristics and temporal dependencies of 3D joints using dynamic time warping, hand-crafted, and spatial co-occurrence features. However, the representation derived from the spatial stream overemphasizes the temporal information; thus, it yields limited expressive power. Some studies use skeleton sequences as frames to enhance the expressive power of representations but lose the generalization capability because the derived temporal smoothness is specific to a particular dataset. The proposed work uses joint distance maps as a base representation that encodes the spatial and temporal information to color texture images. We increase the expressive power by extracting the feature maps from pre-trained networks on ImageNet to diversify the texture representation and propose a network architecture to model the temporal dependency explicitly. We also explore various fusion strategies to generate diverse representations from the feature maps of the pre-trained networks. The experimental results show that the proposed method achieves the best recognition accuracy when using decision-level fusion with meta-learners (Random Forest). The analysis also reveals that the use of feature-level fusion yields relatively good results in terms of the trade-off, i.e., on par recognition performance with some decision-level fusion strategies while having less tunable parameters. Extensive experimental results and comparative analysis on three benchmark datasets prove that the proposed representation and network not only yield better recognition accuracy but also exhibit stronger generalization capability on multiple datasets.

Original languageEnglish
Pages (from-to)3729-3746
Number of pages18
JournalJournal of Ambient Intelligence and Humanized Computing
Volume13
Issue number8
DOIs
Publication statusPublished - Aug 2022
Externally publishedYes

Keywords

  • 3D skeleton data
  • Convolutional neural networks
  • Decision-level fusion
  • Feature-level fusion
  • Human action recognition
  • Long-short term memory networks

Fingerprint

Dive into the research topics of 'Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies'. Together they form a unique fingerprint.

Cite this