Importance sampling strategy for non-convex randomized block-coordinate descent
- Others:
- Joseph Louis LAGRANGE (LAGRANGE) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Institut national des sciences de l'Univers (INSU - CNRS)-Observatoire de la Côte d'Azur ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)
- Observatoire de la Côte d'Azur (OCA) ; Institut national des sciences de l'Univers (INSU - CNRS)-Centre National de la Recherche Scientifique (CNRS)
- Equipe Apprentissage (DocApp - LITIS) ; Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS) ; Université Le Havre Normandie (ULH) ; Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN) ; Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie) ; Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH) ; Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN) ; Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie) ; Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
Description
As the number of samples and dimensionality of optimization problems related to statistics an machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that helps randomized coordinate descent algorithms to focus on blocks that are still far from convergence. The framework applies to problems composed of the sum of two possibly non-convex terms, one being separable and non-smooth. We have compared our algorithm to a full gradient proximal approach as well as to a randomized block coordinate algorithm that considers uniform sampling and cyclic block coordinate descent. Experimental evidences show the clear benefit of using an importance sampling strategy.
Abstract
International audience
Additional details
- URL
- https://hal.science/hal-01336588
- URN
- urn:oai:HAL:hal-01336588v1
- Origin repository
- UNICA