The aim of this paper is to establish two fundamental measure-metric properties of particular random geometric graphs. We consider $\varepsilon$-neighborhood graphs whose vertices are drawn independently and identically distributed from a common distribution defined on a regular submanifold of $\mathbb{R}^K$. We show that a volume doubling...
-
December 18, 2020 (v1)PublicationUploaded on: December 4, 2022
-
October 7, 2021 (v1)Publication
Let $\mathbf{X} = (X_i)_{1\leq i \leq n}$ be an i.i.d. sample of square-integrable variables in $\mathbb{R}^d$, with common expectation $\mu$ and covariance matrix $\Sigma$, both unknown. We consider the problem of testing if $\mu$ is $\eta$-close to zero, i.e. $\|\mu\| \leq \eta $ against $\|\mu\| \geq (\eta + \delta)$; we also tackle the more...
Uploaded on: December 4, 2022 -
December 6, 2021 (v1)Conference paper
We investigate the problem of minimizing the excess generalization error with respect to the best expert prediction in a finite family in the stochastic setting, under limited access to information. We assume that the learner only has access to a limited number of expert advices per training round, as well as for prediction. Assuming that the...
Uploaded on: December 3, 2022 -
October 31, 2023 (v1)Publication
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. While classical results only concern their marginal distribution, we show...
Uploaded on: November 25, 2023 -
2020 (v1)Journal article
We study a non-linear statistical inverse problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization) approach to estimate the quantity for the non-linear ill-posed inverse problem. The estimator is defined...
Uploaded on: December 4, 2022 -
November 16, 2021 (v1)Book section
This document is a book chapter which gives a partial survey on post hoc approaches to false positive control.
Uploaded on: December 4, 2022 -
March 2023 (v1)Publication
We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in $\{0, 1\}$. Given a...
Uploaded on: June 2, 2023 -
2019 (v1)Conference paper
Rejection Sampling is a fundamental Monte-Carlo method. It is used to sample from distributions admitting a probability density function which can be evaluated exactly at any given point, albeit at a high computational cost. However, without proper tuning, this technique implies a high rejection rate. Several methods have been explored to cope...
Uploaded on: December 4, 2022 -
July 4, 2023 (v1)Publication
We derive both Azuma-Hoeffding and Burkholder-type inequalities for partial sums over a rectangulargrid of dimension $d$ of a random field satisfying a weak dependency assumption of projective type:the difference between the expectation of an element of the random field and its conditional expectationgiven the rest of the field at a distance...
Uploaded on: July 7, 2023 -
December 18, 2020 (v1)Publication
We study the stochastic multi-armed bandit problem in the case when the arm samples are dependent over time and generated from so-called weak $\cC$-mixing processes. We establish a $\cC-$Mix Improved UCB agorithm and provide both problem-dependent and independent regret analysis in two different scenarios. In the first, so-called fast-mixing...
Uploaded on: December 4, 2022 -
October 21, 2020 (v1)Publication
In the setting of supervised learning using reproducing kernel methods, we propose a data-dependent regularization parameter selection rule that is adaptive to the unknown regularity of the target function and is optimal both for the least-square (prediction) error and for the reproducing kernel Hilbert space (reconstruction) norm error. It is...
Uploaded on: December 4, 2022 -
June 2020 (v1)Journal article
We follow a post-hoc, "user-agnostic" approach to false discovery control in a large-scale multiple testing framework, as introduced by Genovese and Wasserman (2006), Goeman and Solari (2011): the statistical guarantee on the number of correct rejections must hold for any set of candidate items, possibly selected by the user after having seen...
Uploaded on: December 4, 2022 -
October 9, 2024 (v1)Publication
We provide new nonasymptotic false discovery proportion (FDP) confidence envelopes in several multiple testing settings relevant for modern high dimensional-data methods. We revisit the multiple testing scenarios considered in the recent work of Katsevich and Ramdas (2020): top-k, preordered (including knockoffs), online. Our emphasis is on...
Uploaded on: October 10, 2024 -
June 1, 2023 (v1)Publication
We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest mean reward with a probability of at least 1 − δ, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the...
Uploaded on: June 7, 2023 -
April 13, 2021 (v1)Conference paper
We propose an improved estimator for the multi-task averaging problem, whose goal is the joint estimation of the means of multiple distributions using separate, independent data sets. The naive approach is to take the empirical mean of each data set individually, whereas the proposed method exploits similarities between tasks, without any...
Uploaded on: December 4, 2022 -
February 14, 2021 (v1)Publication
Greedy algorithms for feature selection are widely used for recovering sparse high-dimensional vectors in linear models. In classical procedures, the main emphasis was put on the sample complexity, with little or no consideration of the computation resources required. We present a novel online algorithm: Online Orthogonal Matching Pursuit...
Uploaded on: December 4, 2022 -
2023 (v1)Conference paper
We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input δ, the goal is to identify the arm with the highest mean reward with a probability of at least 1 − δ, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the...
Uploaded on: December 7, 2023 -
August 21, 2021 (v1)Journal article
We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper.The principle of compressive statistical learning is to compress a training collection, in one pass, into a...
Uploaded on: July 4, 2023 -
August 21, 2021 (v1)Journal article
We describe a general framework --compressive statistical learning-- for resource-efficient large-scale learning: the training collection is compressed in one pass into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. A near-minimizer of the risk...
Uploaded on: December 4, 2022 -
June 8, 2023 (v1)Publication
Quantification learning deals with the task of estimating the target label distribution under label shift. In this paper, we first present a unifying framework, distribution feature matching (DFM), that recovers as particular instances various estimators introduced in previous literature. We derive a general performance bound for DFM...
Uploaded on: June 10, 2023 -
December 1, 2020 (v1)Journal article
In a high‐dimensional multiple testing framework, we present new confidence bounds on the false positives contained in subsets S of selected null hypotheses. These bounds are post hoc in the sense that the coverage probability holds simultaneously over all S, possibly chosen depending on the data. This article focuses on the common case of...
Uploaded on: December 4, 2022 -
May 1, 2021 (v1)Publication
We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of Sun and Cai (2009). While previous work has concentrated on deriving specific procedures with a...
Uploaded on: December 4, 2022