FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
- Creators
- Terrail, Jean Ogier Du
- Ayed, Samy-Safwan
- Cyffers, Edwige
- Grimberg, Felix
- He, Chaoyang
- Loeb, Regis
- Mangold, Paul
- Marchand, Tanguy
- Marfoq, Othmane
- Mushtaq, Erum
- Muzellec, Boris
- Philippenko, Constantin
- Silva, Santiago
- Teleńczuk, Maria
- Albarqouni, Shadi
- Avestimehr, Salman
- Bellet, Aurélien
- Dieuleveut, Aymeric
- Jaggi, Martin
- Karimireddy, Sai Praneeth
- Lorenzi, Marco
- Neglia, Giovanni
- Tommasi, Marc
- Andreux, Mathieu
- Others:
- Owkin France
- Université Côte d'Azur (UCA)
- Machine Learning in Information Networks (MAGNET) ; Inria Lille - Nord Europe ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL) ; Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
- Ecole Polytechnique Fédérale de Lausanne (EPFL)
- FedML, Inc (FedML)
- Network Engineering and Operations (NEO ) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- University of Southern California (USC)
- Centre de Mathématiques Appliquées - Ecole Polytechnique (CMAP) ; École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)
- University Hospital Bonn
- Helmholtz Munich, Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
- University of California [Berkeley] (UC Berkeley) ; University of California (UC)
- E-Patient : Images, données & mOdèles pour la médeciNe numériquE (EPIONE) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- ANR-20-CE23-0015,PRIDE,Apprentissage automatique décentralisé et préservant la vie privée(2020)
Description
Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
Abstract
International audience
Additional details
- URL
- https://hal.science/hal-03900026
- URN
- urn:oai:HAL:hal-03900026v1
- Origin repository
- UNICA