Gradient scarcity with Bilevel Optimization for Graph Learning
- Creators
- Ghanem, Hashem
- Vaiter, Samuel
- Keriven, Nicolas
- Others:
- Institut de Mathématiques de Bourgogne [Dijon] (IMB) ; Université de Bourgogne (UB)-Université Bourgogne Franche-Comté [COMUE] (UBFC)-Centre National de la Recherche Scientifique (CNRS)
- Centre National de la Recherche Scientifique (CNRS)
- Laboratoire Jean Alexandre Dieudonné (JAD) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) ; Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique (IMT Atlantique) ; Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)
- ANR-18-CE40-0005,GraVa,Méthodes variationnelles pour les signaux sur graphe(2018)
- ANR-21-CE23-0006,GRandMa,Graphes Aléatoires en Apprentissage Statistique(2021)
Description
A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.
Additional details
- URL
- https://hal.science/hal-04041721
- URN
- urn:oai:HAL:hal-04041721v1
- Origin repository
- UNICA