ImaGINator: Conditional Spatio-Temporal GAN for Video Generation
- Others:
- Spatio-Temporal Activity Recognition Systems (STARS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Université Côte d'Azur (UCA)
- University of Warsaw (UW)
- ANR-17-CE39-0002,ENVISION,Analyse Holistique automatique d'individus par des techniques de vision par ordinateur(2017)
- ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)
Description
Generating human videos based on single images entails the challenging simultaneous generation of realistic and visual appealing appearance and motion. In this context, we propose a novel conditional GAN architecture, namely ImaGINator, which given a single image, a condition (la-bel of a facial expression or action) and noise, decomposes appearance and motion in both latent and high level feature spaces, generating realistic videos. This is achieved by (i) a novel spatio-temporal fusion scheme, which generates dynamic motion, while retaining appearance throughout the full video sequence by transmitting appearance (originating from the single image) through all layers of the network. In addition, we propose (ii) a novel transposed (1+2)D convo-lution, factorizing the transposed 3D convolutional filters into separate transposed temporal and spatial components, which yields significantly gains in video quality and speed. We extensively evaluate our approach on the facial expression datasets MUG and UvA-NEMO, as well as on the action datasets NATOPS and Weizmann. We show that our approach achieves significantly better quantitative and qualitative results than the state-of-the-art.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-02368319
- URN
- urn:oai:HAL:hal-02368319v1
- Origin repository
- UNICA