The Most Difference in Means: A Statistic for the Strength of Null and Near-Zero Results
Abstract
Statistical insignificance does not suggest the absence of effect, yet scientists must often use null results as evidence of negligible (near-zero) effect size to falsify scientific hypotheses. Doing so must assess a result's null strength, defined as the evidence for a negligible effect size. Such an assessment would differentiate strong null results that suggest a negligible effect size from weak null results that suggest a broad range of potential effect sizes. We propose the most difference in means (δM) as a two-sample statistic that can both quantify null strength and perform a hypothesis test for negligible effect size. To facilitate consensus when interpreting results, our statistic allows scientists to conclude that a result has negligible effect size using different thresholds with no recalculation required. To assist with selecting a threshold, δM can also compare null strength between related results. Both δM and the relative form of δM outperform other candidate statistics in comparing null strength. We compile broadly related results and use the relative δM to compare null strength across different treatments, measurement methods, and experiment models. Reporting the relative δM may provide a technical solution to the file drawer problem by encouraging the publication of null and near-zero results.