Theoretical Foundations of δ-margin Majority Voting
Abstract
In high-stakes ML applications such as fraud detection, medical diagnostics, and content moderation, practitioners rely on consensus-based approaches to control prediction quality. A particularly valuable technique -- δδ δ-margin majority voting -- collects votes sequentially until one label exceeds alternatives by a threshold δδ δ, offering stronger confidence than simple majority voting. Despite widespread adoption, this approach has lacked rigorous theoretical foundations, leaving practitioners reliant on heuristics for key metrics like expected accuracy and cost. This paper establishes a comprehensive theoretical framework for δδ δ-margin majority voting by formulating it as an absorbing Markov chain and leveraging Gambler's Ruin theory. Our contributions form a practical design calculus for δδ δ-margin voting: (1)~Closed-form expressions for consensus accuracy, expected voting duration, variance, and the stopping-time PMF, enabling model-based design rather than trial-and-error. (2)~A Bayesian extension handling uncertainty in worker accuracy, supporting real-time monitoring of expected quality and cost as votes arrive, with single-Beta and mixture-of-Betas priors. (3)~Cost-calibration methods for achieving equivalent quality across worker pools with different accuracies and for setting payment rates accordingly. We validate our predictions on two real-world datasets, demonstrating close agreement between theory and observed outcomes. The framework gives practitioners a rigorous toolkit for designing δδ δ-margin voting processes, replacing ad-hoc experimentation with model-based design where quality control and cost transparency are essential.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.