Detecting Where Effects Occur by Testing Hypotheses in Order
Abstract
Experimental evaluations of public policy often randomize an intervention within many sites or blocks. Once an overall effect is reported, the question that matters for action is where it occurred. Standard multiple-testing corrections answer with little power because they ignore how the experiment is organized: blocks nest within cohorts, sites, and districts. We organize the hypotheses as a tree that follows this administrative structure and test them top-down, descending into a branch only when its parent null is rejected. We show that stopping rule and valid node-level tests suffice for weak control of the family-wise error rate (FWER). Whether the same procedure also controls the FWER in the strong sense depends on a single quantity computable before any data are seen: an error load that summarizes how rejection probability accumulates along paths through the tree. This diagnostic tells an analyst in advance, from design quantities alone, whether the unadjusted procedure controls the FWER or an adjustment is required. Across 25 block-randomized MDRC education trials it indicates that no adjustment is needed in every one, so the two conditions alone control the FWER while each test runs at the full nominal level; the top-down procedure detects individual blocks that the Hommel correction misses and locates higher-level groups of blocks that bottom-up testing cannot evaluate. For high-error-load designs we derive an adaptive alpha-schedule, prove it controls the FWER on regular, irregular, and pruned trees, and confirm it in simulation. The same diagnostic flags when it is needed: in a design calibrated to the National Job Corps Study, a wide multisite trial of about one hundred centers, the unadjusted procedure inflates the FWER, the adaptive schedule restores control, and top-down testing still detects more affected sites than bottom-up or hierarchical corrections.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.