Published March 2015 | Version v1
Report

A Sequential Nonparametric Two-Sample Test

Description

Given samples from two distributions, a nonparametric two-sample testaims at determining whether the two distributions are equal or not,based on a test statistic. This statistic may be computed on the wholedataset, or may be computed on a subset of the dataset by a functiontrained on its complement. We propose a third tier, consisting offunctions exploiting a sequential framework to learn the differenceswhile incrementally processing the data. Sequential processingnaturally allows optional stopping, which makes our test the firsttruly sequential nonparametric two-sample test.We show that any sequential predictor can be turned into a sequentialtwo-sample test for which a valid $p$-value can be computed, yieldingcontrolled type I error. We also show that pointwise universalpredictors yield consistent tests, which can be built with anonparametric regressor based on $k$-nearest neighbors in particular.We also show that mixtures and switch distributions can be used toincrease power, while keeping consistency.

Additional details

Created:
March 25, 2023
Modified:
December 1, 2023