E-values

This works even if, as often happens in practice, the decision to perform later experiments may depend in vague, unknown ways on the data observed in earlier experiments, and it is not known beforehand how many trials will be conducted: the product e-value remains a meaningful quantity, leading to tests with Type-I error control.

For this reason, e-values and their sequential extension, the e-process, are the fundamental building blocks for anytime-valid statistical methods (e.g. confidence sequences).

In practice, the term e-value (a number) is often used when one is really referring to the underlying e-variable (a random variable, that is, a measurable function of the data).

From this angle, the main innovation of the e-value compared to traditional testing is to maximize a different power target.

Type-I error, as it allows one to choose the significance level after observing the data: post-hoc.

vanishes: we can simply choose the smallest data-dependent level at which we reject the hypothesis by setting it equal to the post-hoc p-value:

The two main ways of constructing e-variables, UI and RIPr (see below) both lead to expressions that are variations of likelihood ratios as well.

is an e-variable" and "if the null hypothesis is true, you do not expect to gain any money if you engage in this bet" are logically equivalent.

being an e-variable means that the expected gain of buying the ticket is the pay-off minus the cost, i.e.

Waudby-Smith and Ramdas use this approach to construct "nonparametric" confidence intervals for the mean that tend to be significantly narrower than those based on more classical methods such as Chernoff, Hoeffding and Bernstein bounds.

[6] E-values are more suitable than p-value when one expects follow-up tests involving the same null hypothesis with different data or experimental set-ups.

We say that testing based on e-values remains safe (Type-I valid) under optional continuation.

Mathematically, this is shown by first showing that the product e-variables form a nonnegative discrete-time martingale in the filtration generated by

The results then follow as a consequence of Doob's optional stopping theorem and Ville's inequality.

independently of the data we get a trivial e-value: it is an e-variable by definition, but it will never allow us to reject the null hypothesis.

Grünwald et al. show that under weak regularity conditions, the GRO e-variable exists, is essentially unique, and is given by

[4] In parametric settings, we can simply combine the main methods for the composite alternative (obtaining

The advantage of the UI method compared to RIPr is that (a) it can be applied whenever the MLE can be efficiently computed - in many such cases, it is not known whether/how the reverse information projection can be calculated; and (b) that it 'automatically' gives not just an e-variable but a full e-process (see below): if we replace

, the resulting ratio is still an e-variable; for the reverse information projection this automatic e-process generation only holds in special cases.

Its main disadvantage compared to RIPr is that it can be substantially sub-optimal in terms of the e-power/GRO criterion, which means that it leads to tests which also have less classical statistical power than RIPr-based methods.

[4] Finally, in practice, one sometimes resorts to mathematically or computationally convenient combinations of RIPr, UI and other methods.

In basic cases, the stopping time can be defined by any rule that determines, at each sample size

For example, her boss may tell her to stop data collecting and she may not know exactly why - nevertheless, she gets a valid e-variable and Type-I error control.

This is in sharp contrast to data analysis based on p-values (which becomes invalid if stopping rules are not determined in advance) or in classical Wald-style sequential analysis (which works with data of varying length but again, with stopping times that need to be determined in advance).

In more complex cases, the stopping time has to be defined relative to some slightly reduced filtration, but this is not a big restriction in practice.

is a test supermartingale, and hence also an e-process (note that we already used this construction in the example described under "e-values as bets" above: for fixed

[4] Historically, e-values implicitly appear as building blocks of nonnegative supermartingales in the pioneering work on anytime-valid confidence methods by well-known mathematician Herbert Robbins and some of his students.

[18] The first time e-values (or something very much like them) are treated as a quantity of independent interest is by another well-known mathematician, Leonid Levin, in 1976, within the theory of algorithmic randomness.

With the exception of contributions by pioneer V. Vovk in various papers with various collaborators (e.g.[16][15]), and an independent re-invention of the concept in an entirely different field,[19] the concept did not catch on at all until 2019, when, within just a few months, several pioneering papers by several research groups appeared on arXiv (the corresponding journal publications referenced below sometimes coming years later).

In 2023 the first overview paper on "safe, anytime-valid methods", in which e-values play a central role, appeared.