A Sequential Nonparametric Two-Sample Test
Description
Given samples from two distributions, a nonparametric two-sample testaims at determining whether the two distributions are equal or not,based on a test statistic. This statistic may be computed on the wholedataset, or may be computed on a subset of the dataset by a functiontrained on its complement. We propose a third tier, consisting offunctions exploiting a sequential framework to learn the differenceswhile incrementally processing the data. Sequential processingnaturally allows optional stopping, which makes our test the firsttruly sequential nonparametric two-sample test.We show that any sequential predictor can be turned into a sequentialtwo-sample test for which a valid $p$-value can be computed, yieldingcontrolled type I error. We also show that pointwise universalpredictors yield consistent tests, which can be built with anonparametric regressor based on $k$-nearest neighbors in particular.We also show that mixtures and switch distributions can be used toincrease power, while keeping consistency.
Additional details
- URL
- https://hal.inria.fr/hal-01135608
- URN
- urn:oai:HAL:hal-01135608v2
- Origin repository
- UNICA