Vision Transformers (ViTs) have achieved impressive results in computer vision, excelling in tasks such as image classification, segmentation, and object detection. However, their quadratic complexity $O(N^2)$, where $N$ is the token sequence length, poses challenges when deployed on resource-limited devices. To address this issue, dynamic...
-
February 26, 2025 (v1)Conference paperUploaded on: April 5, 2025
-
December 2, 2024 (v1)Conference paper
Vision Transformers (ViTs) have shown promising results in computer vision tasks, challenging CNN architectures on image classification, segmentation and object detection. However, their quadratic complexity O(N 2 ), where N is the token sequence length, hinders their deployment on edge devices. To tackle this challenge, researchers have...
Uploaded on: January 13, 2025