Published February 7, 2023 | Version v1
Publication

Sparse GEMINI for joint discriminative clustering and feature selection

Others:
Modèles et algorithmes pour l'intelligence artificielle (MAASAI) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Laboratoire Jean Alexandre Dieudonné (LJAD) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS)
Laboratoire Jean Alexandre Dieudonné (LJAD) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
Centre de recherche du CHU de Québec-Université Laval (CRCHUQ) ; CHU de Québec–Université Laval ; Université Laval [Québec] (ULaval)-Université Laval [Québec] (ULaval)
France Canada Research Fund
ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)
European Project: H2020-951911,H2020-EU.2.1.1. - INDUSTRIAL LEADERSHIP - Leadership in enabling and industrial technologies - Information and Communication Technologies (ICT),AI4Media(2020)

Description

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on p(x x x), we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple 1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a clustering model p θ (y|x x x). We demonstrate the performances of Sparse GEMINI on synthetic datasets as well as large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

Additional details

Created:
October 11, 2023
Modified:
November 30, 2023