Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated Learning
- Others:
- E-Patient : Images, données & mOdèles pour la médeciNe numériquE (EPIONE) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Accenture Labs [Sophia Antipolis]
- This work has been supported by the French government, through the 3IA Côte d'Azur Investments in the Future project managed by the National Research Agency (ANR) with the reference number ANR-19-P3IA-0002, and by the ANR JCJC project Fed-BioMed 19-CE45-0006-01. The project was also supported by Accenture. The authors are grateful to the OPAL infrastructure from Université Côte d'Azur for providing resources and support.
- ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)
- ANR-19-CE45-0006,FED-BIOMED,Apprentissage statistique fédéré pour une nouvelle generation de méta-analyses de données biomédicales sécurisés et à grande échelle(2019)
Description
This work addresses the problem of optimizing communications between server and clients in federated learning (FL). Current sampling approaches in FL are either biased, or non optimal in terms of server-clients communications and training stability. To overcome this issue, we introduce clustered sampling for clients selection. We prove that clustered sampling leads to better clients representatitivity and to reduced variance of the clients stochastic aggregation weights in FL. Compatibly with our theory, we provide two different clustering approaches enabling clients aggregation based on 1) sample size, and 2) models similarity. Through a series of experiments in non-iid and unbalanced scenarios, we demonstrate that model aggregation through clustered sampling consistently leads to better training convergence and variability when compared to standard sampling approaches. Our approach does not require any additional operation on the clients side, and can be seamlessly integrated in standard FL implementations. Finally, clustered sampling is compatible with existing methods and technologies for privacy enhancement, and for communication reduction through model compression.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-03232421
- URN
- urn:oai:HAL:hal-03232421v1
- Origin repository
- UNICA