Published May 1, 2021 | Version v1
Publication

Post hoc false discovery proportion inference under a Hidden Markov Model

Others:
Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145) ; Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité)
Laboratoire de Mathématiques d'Orsay (LMO) ; Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Understanding the Shape of Data (DATASHAPE) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Saclay - Ile de France ; Institut National de Recherche en Informatique et en Automatique (Inria)
Institut de Mathématiques de Toulouse UMR5219 (IMT) ; Université Toulouse 1 Capitole (UT1) ; Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse) ; Institut National des Sciences Appliquées (INSA)-Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3) ; Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)
Laboratoire de Probabilités, Statistique et Modélisation (LPSM (UMR_8001)) ; Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université Paris Cité (UPCité)
ANR-16-CE40-0019,SansSouci,Approches post hoc pour les tests multiples à grande échelle(2016)
ANR-19-CHIA-0021,BISCOTTE,Approches statistiquement et computationnellement efficicaces pour l'intelligence artificielle(2019)
ANR-17-CE40-0001,BASICS,Bayésien non-paramétrique, quantification de l'incertitude et structures aléatoires(2017)

Description

We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples.

Additional details

Created:
December 4, 2022
Modified:
December 1, 2023