In traditional hypothesis testing we fix a significance level and look for tests which have type-I error bounded by :
where is the null. To do post-hoc hypothesis testing, we want to be able to quantify over , so we rewrite this as risk:
(We make a similar move when viewing hypothesis testing through the lens of decision-theory: Neyman-Pearson paradigm with losses). To allow to be data-dependent, we add a supremum over inside the expectation (it is implicitly outside the expectation in the previous display):
We call a rule which obeys the above inequality a post-hoc valid hypothesis test.
Suppose the test is based on an e-value, i.e., for a given , for some e-variable for . Then by case analysis on the numerator, so
meaning that tests based on thresholding e-values e-values enable post-hoc hypothesis testing.
The other direction is true as well: If one has a post-hoc valid test that is based on some threshold , i.e., , then must be an e-value. You can see this by taking in the definition of post-hoc validity.
This was first pointed out in False discovery rate control with e-values by Wang and Ramdas (2022) and in Beyond Neyman-Pearson by Grunwald (2022).