Too good to be true: when overwhelming evidence fails to convince
Abstract
Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith, however rarely is consideration given to whether a systemic failure has occurred. Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence, and (iii) cryptographic primality testing. We find that even with surprisingly low systemic failure rates high confidence is very difficult to achieve and in particular we find that certain analyses of cryptographically-important numerical tests are highly optimistic, underestimating their false-negative rate by as much as a factor of 280.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.