In this paper, we address the detection of daily living activities in long-term untrimmed videos. The detection of daily living activities is challenging due to their long temporal components, low inter-class variation and high intra-class variation. To tackle these challenges, recent approaches based on Temporal Convolutional Networks (TCNs)...
-
September 18, 2019 (v1)Conference paperUploaded on: December 4, 2022
-
December 19, 2022 (v1)Publication
Current self-supervised approaches for skeleton action representation learning often focus on constrained scenarios, where videos and skeleton data are recorded in laboratory settings. When dealing with estimated skeleton data in realworld videos, such methods perform poorly due to the large variations across subjects and camera viewpoints. To...
Uploaded on: February 22, 2023 -
November 22, 2021 (v1)Conference paper
Action recognition based on skeleton data has recently witnessed increasing attention and progress. State-of-the-art approaches adopting Graph Convolutional networks (GCNs) can effectively extract features on human skeletons relying on the pre-defined human topology. Despite associated progress, GCN-based methods have difficulties to generalize...
Uploaded on: December 3, 2022 -
December 15, 2021 (v1)Conference paper
Action recognition based on human pose has witnessed increasing attention due to its robustness to changes in appearances, environments, and viewpoints. Despite associated progress, one remaining challenge has to do with occlusion in real-world videos that hinders the visibility of all joints. Such occlusion impedes representation of such...
Uploaded on: December 3, 2022 -
January 19, 2023 (v1)Publication
Video anomaly detection in surveillance systems with only video-level labels (i.e. weakly-supervised) is challenging. This is due to, (i) complex integration of human and scene based anomalies comprising of subtle and sharp spatio-temporal cues in real-world scenarios, (ii) non-optimal optimization between normal and anomaly instances under...
Uploaded on: February 22, 2023 -
January 2, 2024 (v1)Conference paper
Video anomaly detection in real-world scenarios is challenging due to the complex temporal blending of long and short-length anomalies with normal ones. Further, it is more difficult to detect those due to : (i) Distinctive features characterizing the short and long anomalies with sharp and progressive temporal cues respectively; (ii) Lack of...
Uploaded on: April 5, 2025 -
January 5, 2021 (v1)Conference paper
Handling long and complex temporal information is an important challenge for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods fail in selecting the key temporal information in long videos. To this end, we introduce the Dilated Attention Layer...
Uploaded on: December 4, 2022 -
2022 (v1)Journal article
Designing activity detection systems that can be successfully deployed in daily-living environments requires datasets that pose the challenges typical of real-world scenarios. In this paper, we introduce a new untrimmed daily-living dataset that features several real-world challenges: Toyota Smarthome Untrimmed (TSU). TSU contains a wide...
Uploaded on: December 3, 2022 -
October 2, 2023 (v1)Conference paper
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions. However, their performances remain limited as the visual features...
Uploaded on: February 24, 2024 -
October 2, 2023 (v1)Conference paper
Skeleton-based action segmentation requires recognizing composable actions in untrimmed videos. Current approaches decouple this problem by first extracting local visual features from skeleton sequences and then processing them by a temporal model to classify frame-wise actions. However, their performances remain limited as the visual features...
Uploaded on: October 13, 2023 -
October 27, 2019 (v1)Conference paper
The performance of deep neural networks is strongly influenced by the quantity and quality of annotated data. Most of the large activity recognition datasets consist of data sourced from the web, which does not reflect challenges that exist in activities of daily living. In this paper, we introduce a large real-world video dataset for...
Uploaded on: December 4, 2022 -
February 7, 2023 (v1)Conference paper
Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time. This leads to loss of pertinent information related to temporal relationships, rendering actions such as `enter' and `leave' to be indistinguishable. To mitigate this...
Uploaded on: October 13, 2023