hypothesis-testingsequential-statisticsplug-in-methodmixture-method

In game-theoretic hypothesis testing, suppose we are testing a simple null vs a composite alternative. We receive samples $(X_{t})$ are testing

$H_{0}:(X_{t})∼P,H_{1}:(X_{t})∼Q∈Θ_{1}.$In the simple vs simple case, testing by betting—simple vs simple, the optimal payoff function turned out to simply be the likelihood-ratio of $Q$ and $P$. But now there is no single $Q$, the likelihood-ratio isn’t well-defined. What do we do?

Since the observations are revealed sequentially (we can simulate this in a batch setting), it’s natural to try and “learn” some appropriate $Q∈Θ_{1}$ that has good power (you could look at this as trying to learn the true distribution $Q_{∗}∈Θ_{1}$ using the data thus far).

There are two general methods.

# Plug-in method

Since the methods are inherently sequential and the payoff function $S_{t}$ (see game-theoretic hypothesis testing:Payoff functions) need only be $F_{t−1}$-measurable, we can employ the likelihood ratio test but change which $Q∈Θ_{1}$ is used every time. That is, if $(X_{t})∼Q_{∗}$, we can try and learn $Q_{∗}$ over time.

In particular, we can consider a payoff function $S_{t}=q_{t}(X_{t}∣F_{t−1})/p(X_{t}∣F_{t−1})$. $Q_{t}$ can be chosen baed on $X_{1},…,X_{t−1}$. Regardless of how it’s chosen $S_{t}$ will remain a test-martingale for $P$. (We are assuming densities are well-defined, otherwise we resort to Radon-Nikodym derivatives.)

While it may be tempting to use MLE to choose $Q_{t}$, this is advised against when the data are discrete, otherwise it may assign 0 probability to an outcome that may in fact occur, and we will go broke.

# The mixture method

Instead of choosing a particular $Q∈Θ_{1}$ to use at each timestep, we mix over all such distributions by placing a distribution over $Θ_{1}$. That is, we take

$S_{t}(X_{t})=∫_{q∈Θ_{1}}q(X_{t}∣F_{t−1})ρ_{t}(dq)/p(X_{t}∣F_{t−1}),$where $ρ_{t}$ is the mixing distribution, and it may depend on $F_{t−1}$. Since averages of distributions remain distributions, the numerator remains a distribution and $S_{t}$ remains a test-martingale. But note there is no guarantee that the numerator remains a distribution in $Θ_{1}$ (unless $Θ_{1}$ happens to be fork-convex) .