Statistical learning is today playing an increasing role in many scientific fields as varied as medicine, imagery, biology, and astronomy. Scientific advances in recent years have significantly increased measurement and calculation capabilities, and it is now difficult for a human operator to process such data exhaustively in a timely manner....
-
March 18, 2020 (v1)Book sectionUploaded on: December 4, 2022
-
October 18, 2018 (v1)Book section
With the increase in measurement capabilities, many medical disciplines have seen their practices deeply modified because of the dimensionality of the data. Although these technical improvements promise significant advances in medical research, the statistical learning methods must be able to cope with the problems encountered in those...
Uploaded on: December 4, 2022 -
April 13, 2023 (v1)Publication
Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a...
Uploaded on: April 16, 2023 -
July 17, 2020 (v1)Conference paper
We consider here the problem of co-clustering count matrices with a high level of missing values that may evolve along the time. We introduce a generative model, named dynamic latent block model (dLBM), to handle this situation and which extends the classical binary latent block model (LBM) to the dynamic case. The modeling of the dynamic time...
Uploaded on: December 4, 2022 -
May 23, 2021 (v1)Journal article
High-dimensional data clustering has become and remains a challenging task for modern statistics and machine learning, with a wide range of applications. We consider in this work the powerful discriminative latent mixture model, and we extend it to the Bayesian framework. Modeling data as a mixture of Gaussians in a low-dimensional...
Uploaded on: December 4, 2022 -
July 17, 2019 (v1)Journal article
International audience
Uploaded on: December 4, 2022 -
June 7, 2020 (v1)Conference paper
Nous considérons le problème du co-clustering des matrices binaires qui peuvent évoluer dans le temps et nous introduisons un modèle génératif pour le gérer. Le modèle proposé, appelé dynamic latent block model, étend le modèle des blocs latents binaire classique au cas dynamique. La modélisation de la dynamique en temps continu repose sur un...
Uploaded on: December 4, 2022 -
2023 (v1)Publication
Communication networks such as emails or social networks are now ubiquitous and their analysis has become a strategic field. In many applications, the goal is to automatically extract relevant information by looking at the nodes and their connections. Unfortunately, most of the existing methods focus on analysing the presence or absence of...
Uploaded on: September 5, 2023 -
July 2023 (v1)Journal article
Videos and images from camera traps are more and more used by ecologists to estimate the population of species on a territory. It is a laborious work since experts have to analyse massive data sets manually. This takes also a lot of time to filter these videos when many of them do not contain animals or are with human presence. Fortunately,...
Uploaded on: April 29, 2023 -
2020 (v1)Journal article
This paper is about the co-clustering of ordinal data. Such data are very common on e-commerce platforms where customers rank the products/services they bought. More in details, we focus on arrays of ordinal (possibly missing) data involving two disjoint sets of individuals/objects corresponding to the rows/columns of the arrays. Typically, an...
Uploaded on: December 4, 2022 -
September 21, 2022 (v1)Publication
Communication networks such as emails or social networks are now ubiquitous and their analysis has become a strategic field. In many applications, the goal is to automatically extract relevant information by looking at the nodes and their connections. Unfortunately, most of the existing methods focus on analysing the presence or absence of...
Uploaded on: December 3, 2022 -
October 6, 2022 (v1)Publication
The simultaneous clustering of observations and features of data sets (known as co-clustering) has recently emerged as a central machine learning application to summarize massive data sets. However, most existing models focus on continuous and dense data in stationary scenarios, where cluster assignments do not evolve over time. This work...
Uploaded on: December 3, 2022 -
September 18, 2023 (v1)Conference paper
The simultaneous clustering of observations and features of data sets (a.k.a. co-clustering) has recently emerged as a central machine learning task to summarize massive data sets. However, most existing models focus on stationary scenarios, where cluster assignments do not evolve in time. This work introduces a novel latent block model for the...
Uploaded on: September 5, 2023 -
April 13, 2023 (v1)Publication
Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a...
Uploaded on: February 18, 2024 -
April 13, 2023 (v1)Publication
Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a...
Uploaded on: January 12, 2024 -
October 1, 2022 (v1)Publication
Videos and images from camera traps are more and more used by ecologists to estimate the population of species on a territory. Most of the time, it is a laborious work since the experts analyse manually all this data. It takes also a lot of time to filter these videos when there are plenty of empty videos or with humans presence. Fortunately,...
Uploaded on: December 4, 2022 -
2021 (v1)Journal article
Defining templates of galaxy spectra is useful to quickly characterise new observations and organise databases from surveys. These templates are usually built from a pre-defined classification based on other criteria. Aims. We present an unsupervised classification of 702248 spectra of galaxies and quasars with redshifts smaller than 0.25 that...
Uploaded on: December 4, 2022 -
2020 (v1)Journal article
Patch-based methods are widely used in various topics of image processing, such as image restoration or image editing and synthesis. Patches capture local image geometry and structure and are much easier to model than whole images: in practice, patches are small enough to be represented by simple multivariate priors. An important question...
Uploaded on: December 4, 2022 -
2020 (v1)Journal article
We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood...
Uploaded on: December 4, 2022 -
2021 (v1)Journal article
Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective...
Uploaded on: December 4, 2022 -
October 5, 2022 (v1)Conference paper
With the significant increase of interactions between individuals through numeric means, the clustering of vertex in graphs has become a fundamental approach for analysing large and complex networks. We propose here the deep latent position model (DeepLPM), an end-to-end clustering approach which combines the widely used latent position model...
Uploaded on: December 4, 2022 -
July 17, 2020 (v1)Conference paper
We introduce a deep latent recommender system (deepLTRS) for imputing missing ratings based on the observed ratings and product reviews. Our approach extends a standard variational autoen-coder architecture associated with deep latent variable models in order to account for both the ordinal entries and the text entered by users to score and...
Uploaded on: December 4, 2022 -
April 4, 2022 (v1)Publication
With the significant increase of interactions between individuals through numeric means, clustering of vertices in graphs has become a fundamental approach for analyzing large and complex networks. In this work, we propose the deep latent position model (DeepLPM), an end-to-end generative clustering approach which combines the widely used...
Uploaded on: December 3, 2022 -
2021 (v1)Journal article
Abstract Multivariate time-dependent data, where multiple features are observed over time for a set of individuals, are increasingly widespread in many application domains. To model these data, we need to account for relations among both time instants and variables and, at the same time, for subject heterogeneity. We propose a new co-clustering...
Uploaded on: December 4, 2022