Published September 20, 2021
| Version v1
Journal article
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
Contributors
Others:
- Institute of Psychiatry, Psychology & Neuroscience, King's College London ; King's College London
- Understanding the Shape of Data (DATASHAPE) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Saclay - Ile de France ; Institut National de Recherche en Informatique et en Automatique (Inria)
- Laboratoire de Mathématiques Jean Leray (LMJL) ; Centre National de la Recherche Scientifique (CNRS)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST) ; Nantes Université - pôle Sciences et technologie ; Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie ; Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)
Description
BackgroundThis paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph.ResultsWe present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper.ConclusionsKey strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline.
Abstract
International audienceAdditional details
Identifiers
- URL
- https://hal.inria.fr/hal-03368489
- URN
- urn:oai:HAL:hal-03368489v1
Origin repository
- Origin repository
- UNICA