Published June 21, 2023 | Version v1
Publication

Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA

Description

Topological data analysis (TDA) is a rapidly growing area of data science which uses the geometry and topology of data sets to produce qualitative multi-scale shape descriptors for subsequent statistical and machine learning tasks. The most common descriptor in TDA is persistent homology, which tracks the topological changes in growing families of subsets of the data set itself, called filtrations, and encodes them in an algebraic object, called a persistence module. The algorithmic and theoretical properties of persistence modules are now well understood in the single-parameter case, that is, when there is only one filtration (e.g., feature scale) to study. In contrast, much less is known in the multi-parameter case, where several filtrations (e.g., scale and density) are used simultaneously. Indeed, the resulting multi-parameter persistence modules are much more complicated and intricate, which dramatically impedes the study of their theoretical properties. However, they usually encode information that is invisible to their singleparametercounterparts, and are thus much more useful descriptors for applications in data science. As a consequence, a lot of attention has been devoted to the construction of tractable and stable proxies for multi-parameter persistence modules. However, most of the proposed approaches in the literature are stillprohibitively expensive to compute on large-scale data, many are limited to at most two filtrations, and the most tractable sacrifice much of the richness of the multi-parameter setting.In this article, we introduce a new parameterized family of topological invariants, taking the formof candidate decompositions, for multi-parameter persistence modules. We prove that our candidatedecompositions are controllable approximations: when restricting to modules that can be decomposed intointerval summands, we establish theoretical results about the approximation error between our candidatedecompositions and the true underlying module in terms of the standard interleaving and bottleneckdistances. Moreover, even when the underlying module does not admit such a decomposition, ourcandidate decompositions are nonetheless stable invariants; small perturbations in the underlying modulelead to small perturbations in the candidate decomposition. Then, we introduce MMA (MultipersistenceModule Approximation): an algorithm for computing stable instances of such invariants, which is basedon fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameterpersistence. By design, MMA can handle an arbitrary number of filtrations, and has bounded complexityand running time. Finally, we present empirical evidence validating the generalization capabilities andrunning time speed-ups of MMA on several data sets.

Additional details

Identifiers

URL
https://inria.hal.science/hal-03689199
URN
urn:oai:HAL:hal-03689199v2

Origin repository

Origin repository
UNICA