Published 2020
| Version v1
Conference paper
Learning Sparse deep neural networks using efficient structured projections on convex constraints for green AI
Creators
Contributors
Others:
- Signal, Images et Systèmes (Laboratoire I3S - SIS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
- Orange Labs [Sophia Antipolis] ; Orange Labs
Description
Deep neural networks (DNN) have been applied recently to different domains andperform better than classical state-of-the-art methods. However the high level of performances of DNNs is most often obtained with networks containing millions of parameters and for which training requires substantial computational power. To deal with this computational issue proximal regularization methods have been proposed in the literature but they are time consuming.\\In this paper, we propose instead a constrained approach. We provide the general framework for this new projection gradient method. Our algorithm iterates a gradient step and a projection on convex constraints.We studied algorithms for different constraints: the classical $\ell_1$ unstructured constraint and structured constraints such as the $\ell_{2,1} $ constraint (Group LASSO). We propose a new $\ell_{1,1} $ structured constraint for which we provide a new projection algorithm. Finally, we used the recent "Lottery optimizer" replacing the threshold by our $\ell_{1,1} $ projection.We demonstrate the effectiveness of this method with three popular datasets (MNIST, Fashion MNIST and CIFAR).Experiments with these datasets show that our projection method using this new $\ell_{1,1} $ structured constraint provides the best decrease in memory and computational power.
Abstract
International audienceAdditional details
Identifiers
- URL
- https://hal.archives-ouvertes.fr/hal-02556382
- URN
- urn:oai:HAL:hal-02556382v3
Origin repository
- Origin repository
- UNICA