Unsupervised Clustering on Multi-Components Datasets: Applications on Images and Astrophysics Data
- Others:
- Laboratoire d'Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S) / Equipe SIGNAL ; Signal, Images et Systèmes (Laboratoire I3S - SIS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
- GIPSA - Communication, Signal et Sécurité (GIPSA-C2S) ; Département Images et Signal (GIPSA-DIS) ; Grenoble Images Parole Signal Automatique (GIPSA-lab) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab) ; Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Stendhal - Grenoble 3-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Description
This paper proposes an original approach to cluster multi-component data sets with an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are then computed in order to initialize the Generalized Lloyd's algorithm, also known as K-means, which allows to circumvent initialization problems. Metrics used for measuring similarity between multi-dimensional data points are based on symmetrical divergences. The use of these informational divergences together with the proposed method lead to better results than some other clustering methods in the framework of astrophysical data processing. An application of this method in the multi-spectral imagery domain with a satellite view of Paris is also presented.
Abstract
5 pages
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-00339542
- URN
- urn:oai:HAL:hal-00339542v1
- Origin repository
- UNICA