Published December 15, 2021
| Version v1
Conference paper
Self-Supervised Video Pose Representation Learning for Occlusion-Robust Action Recognition
Contributors
Others:
- Spatio-Temporal Activity Recognition Systems (STARS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Université Côte d'Azur (UCA)
- Toyota Motor Europe
- ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)
Description
Action recognition based on human pose has witnessed increasing attention due to its robustness to changes in appearances, environments, and viewpoints. Despite associated progress, one remaining challenge has to do with occlusion in real-world videos that hinders the visibility of all joints. Such occlusion impedes representation of such scenes by models that have been trained on full-body pose data, obtained in laboratory conditions with specific sensors. To address this, as a first contribution, we introduce OR-VPE, a novel video pose embedding network that is streamlined to learn an occlusionrobust representation for pose sequences in videos. In order to enable our embedding network to handle partially visible joints, we propose to incorporate a sub-graph data augmentation mechanism during training, which simulates occlusions, into a video pose encoder based on Graph Convolutional Networks (GCNs). As a second contribution, we apply a contrastive learning module to train the video pose representation in a selfsupervised manner without the necessity of action annotations. This is achieved by minimizing the mutual information of the same pose sequence pruned into different spatio-temporal subgraphs. Experimental analyses show that compared to training the same encoder from scratch, our proposed OR-VPE, with pre-training on a large-scale dataset, NTU-RGB+D 120, improves the performance of the downstream action classification on Toyota Smarthome, N-UCLA and Penn Action datasets.
Abstract
International audienceAdditional details
Identifiers
- URL
- https://hal.archives-ouvertes.fr/hal-03476564
- URN
- urn:oai:HAL:hal-03476564v1
Origin repository
- Origin repository
- UNICA