Early Stopping Based on Repeated Significance

Abstract

For a bucket test with a single criterion for success and a fixed number of samples or testing period, requiring a p-value less than a specified value of α for the success criterion produces statistical confidence at level 1 - α. For multiple criteria, a Bonferroni correction that partitions α among the criteria produces statistical confidence, at the cost of requiring lower p-values for each criterion. The same concept can be applied to decisions about early stopping, but that can lead to strict requirements for p-values. We show how to address that challenge by requiring criteria to be successful at multiple decision points.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…