Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression
- Others:
- Combinatorics, Optimization and Algorithms for Telecommunications (COATI) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
- Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
- Université Côte d'Azur (UniCA)
- Laboratoire d'Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S) / Equipe SIGNET ; COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
- Centre National de la Recherche Scientifique (CNRS)
- IEEE/ACM
- ANR-15-IDEX-0001,UCA JEDI,Idex UCA JEDI(2015)
- ANR-17-EURE-0004,UCA DS4H,UCA Systèmes Numériques pour l'Homme(2017)
- ANR-19-CE25-0001,ARTIC,Contrôle basé sur l'Intelligence Artificielle de réseau en nuage(2019)
- ANR-22-PEFT-0002,NF-MUST,end-to-end MUlti-domain Service managemenT architectures (MUST)(2022)
- ANR-23-PECL-0003,CARECloud,Comprendre, Améliorer, Réduire les impacts Environnementaux du Cloud computing(2023)
- European Project:
Description
With the advent and the growing usage of Machine Learning as a Service (MLaaS), cloud and network systems are now offering the possibility to deploy ML tasks on heterogeneous clusters. Then, network and cloud operators have to schedule these tasks, determining both when and on which devices to execute them. In parallel, several solutions, such as neural network compression, were proposed to build small models which can run on limited hardware. These solutions allow choosing the model size at inference time for any targeted processing time without having to re-train the network.In this work, we consider the Deadline Scheduling with Compressible Tasks (DSCT) problem: a novel scheduling problem with task deadlines where the tasks can be compressed. Each task can be executed with a certain compression, presenting a trade-off between its compression level (and, its processing time) and its obtained utility. The objective is to maximize the tasks utilities. We propose an approximation algorithm with proved guarantees to solve the problem. We validate its efficiency with extensive simulation, obtaining near optimal results. As application scenario, we study the problem when the tasks are Deep Learning classification jobs, and the objective is to maximize their global accuracy, but we believe that this new framework and solutions apply to a wide range of application cases.
Abstract
International audience
Additional details
- URL
- https://hal.science/hal-04497548
- URN
- urn:oai:HAL:hal-04497548v1
- Origin repository
- UNICA