December 2, 2024 (v1)
Conference paper
Vision Transformers (ViTs) have shown promising results in computer vision tasks, challenging CNN architectures on image classification, segmentation and object detection. However, their quadratic complexity O(N 2 ), where N is the token sequence length, hinders their deployment on edge devices. To tackle this challenge, researchers have...
Uploaded on: January 13, 2025