On the statistical analysis of grouped data: when Pearson χ2 and other divisible statistics are not goodness-of-fit tests

Abstract

Thousands of experiments are analyzed, and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what new possibilities are at their hands. The article introduces a unifying approach to the analysis of divisible statistics -- that includes Pearson's χ2, the likelihood ratio, and spectral statistics, as special cases -- when a statistician deals with a large number of bins/groups, thus leading to a large number of small or moderate frequencies. Performance of the tests is analyzed against the class of contiguous (local) alternatives. Perhaps the most surprising result here is that, in this `sparse' regime, most of the tests proposed in the literature can be modified to produce more powerful tests, and no single test based on a divisible statistic leads to a goodness-of-fit test. Distribution-free goodness-of-fit tests are also constructed.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…