Missing values are unavoidable when working with data. Their occurrence is exacerbated as more data from different sources become available.However, most statistical models and visualization methods require complete data, and improper handling of missing data results in information loss or biased analyses. Since the seminal work of Rubin 1976,...
-
July 30, 2022 (v1)Journal articleUploaded on: December 3, 2022
-
March 22, 2022 (v1)Publication
We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology, initially introduced for sparse linear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is...
Uploaded on: December 3, 2022 -
2023 (v1)Journal article
Missing values are unavoidable when working with data. Their occurrence is exacerbated as more data from different sources become available. However, most statistical models and visualization methods require complete data, and improper handling of missing data results in information loss or biased analyses. Since the seminal work of Rubin...
Uploaded on: May 27, 2023 -
December 2022 (v1)Journal article
We propose Robust Lasso-Zero, an extension of the Lasso-Zero methodology, initially introduced for sparselinear models, to the sparse corruptions problem. We give theoretical guarantees on the sign recovery of the parameters for a slightly simplified version of the estimator, called Thresholded Justice Pursuit. The use of Robust Lasso-Zero is...
Uploaded on: May 27, 2023 -
April 14, 2023 (v1)Publication
Federated learning allows for the training of machine learn- ing models on multiple decentralized local datasets without requiring explicit data exchange. However, data pre-processing, including strate- gies for handling missing data, remains a major bottleneck in real-world federated learning deployment, and is typically performed locally....
Uploaded on: April 20, 2023 -
February 14, 2023 (v1)Publication
Semi-supervised learning is a powerful techniquefor leveraging unlabeled data to improve machinelearning models, but it can be affected by the pres-ence of "informative" labels, which occur whensome classes are more likely to be labeled thanothers. In the missing data literature, such labelsare called missing not at random. In this paper,we...
Uploaded on: February 22, 2023 -
February 10, 2023 (v1)Publication
Model-based unsupervised learning, as any learning task, stalls as soon asmissing data occurs. This is even more true when the missing data are infor-mative, or said missing not at random (MNAR). In this paper, we proposemodel-based clustering algorithms designed to handle very general typesof missing data, including MNAR data. To do so, we...
Uploaded on: February 22, 2023 -
December 17, 2021 (v1)Publication
Traditional ways for handling missing values are not designed for the clustering purpose and they rarely apply to the general case, though frequent in practice, of Missing Not At Random (MNAR) values. This paper proposes to embed MNAR data directly within model-based clustering algorithms. We introduce a mixture model for different types of...
Uploaded on: December 3, 2022 -
May 11, 2021 (v1)Conference paper
International audience
Uploaded on: December 3, 2022 -
December 21, 2023 (v1)Publication
Model-based unsupervised learning, as any learning task, stalls as soon as missingdata occurs. This is even more true when the missing data are informative, or saidmissing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we...
Uploaded on: December 25, 2023 -
December 21, 2023 (v1)Publication
This document is the accompanying note of the main paper "Model-based Clustering with Missing Not At Random Data". We assume the data missing not at random (MNAR) values, i.e. the effect of missingness depends on on the missing values themselves.An example includes clinical data collected in emergency situations, where doctors may choose to...
Uploaded on: December 25, 2023 -
June 2, 2021 (v1)Conference paper
International audience
Uploaded on: December 3, 2022