A closed-form sample size correction for always-valid inference with optional stopping
Abstract
Sequential tests that allow continuous monitoring are common in A/B experimentation. Power calculations for these tests require simulations that are hard to scale across many metrics on an experimentation platform. Instead, a common sizing heuristic inflates the fixed-sample size until the marginal rejection probability at the planned endpoint reaches 1-β. This last-point rule is conservative because always-valid (AV) power is the probability of a boundary crossing at any time during the run, not at the endpoint alone. We give a closed-form correction factor k(α, β, t0) expressed in elementary functions and the bivariate normal CDF, where t0 = m/nz is the burn-in fraction. The closed-form approximation depends on the boundary only through its value and slope at the planned endpoint and can be evaluated for any smooth concave boundary. We work out three cases: the confidence sequences of Waudby-Smith et al. (2023) and Maharaj et al. (2023), and the mixture sequential probability ratio test of Johari et al. (2022). Setting the total sample size to k · nz, where nz is the fixed-sample size for allocation ratio r, hits empirical power within approximately 3 percentage points of target in Gaussian simulations. The correction factor depends on the allocation ratio r only through t0 = m/nz(r). We study sensitivity to the burn-in parameter and show that the correction saves 8--20% of the last-point sample budget across the operating range.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.