Published March 20, 2025
| Version v1
Publication
What Is a Good Imputation Under MAR Missingness?
Creators
Contributors
Others:
- Médecine de précision par intégration de données et inférence causale (PREMEDICAL) ; Centre Inria d'Université Côte d'Azur (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Desbrest d'Epidémiologie et de Santé Publique (IDESP) ; Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)
- Université Sorbonne Paris Cité (USPC)
- Centre National de la Recherche Scientifique (CNRS)
- Institut Desbrest d'Epidémiologie et de Santé Publique (IDESP) ; Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)
Description
Missing values pose a persistent challenge in modern data science. Consequently, there is an ever-growing number of publications introducing new imputation methods in various fields. The present paper attempts to take a step back and provide a more systematic analysis. Starting from an in-depth discussion of the Missing at Random (MAR) condition for nonparametric imputation, we first develop an identification result showing that the widely used fully conditional specification (FCS) approach indeed identifies the correct conditional distributions. Based on this analysis, we propose three essential properties an ideal imputation method should meet, thus enabling a more principled evaluation of existing methods and more targeted development of new methods. In particular, we introduce a new imputation method, denoted mice-DRF, that meets two out of the three criteria. We also discuss ways to compare imputation methods, based on distributional distances. Finally, numerical experiments illustrate the points made in this discussion.
Additional details
Identifiers
- URL
- https://hal.science/hal-04521894
- URN
- urn:oai:HAL:hal-04521894v4
Origin repository
- Origin repository
- UNICA