Finite-Sample Concentration of the Multinomial in Relative Entropy
Abstract
We show that the moment generating function of the Kullback-Leibler divergence (relative entropy) between the empirical distribution of n independent samples from a distribution P over a finite alphabet of size k (i.e. a multinomial distribution) and P itself is no more than that of a gamma distribution with shape k - 1 and rate n. The resulting exponential concentration inequality becomes meaningful (less than 1) when the divergence is larger than (k-1)/n, whereas the standard method of types bound requires > 1n · n+k-1k-1 ≥ (k-1)/n · (1 + n/(k-1)), thus saving a factor of order (n/k) in the standard regime of parameters where n k. As a consequence, we also obtain finite-sample bounds on all the moments of the empirical divergence (equivalently, the discrete likelihood-ratio statistic), which are within constant factors (depending on the moment) of their asymptotic values. Our proof proceeds via a simple reduction to the case k = 2 of a binary alphabet (i.e. a binomial distribution), and has the property that improvements in the case of k = 2 directly translate to improvements for general k. In particular, we conjecture a bound on the binomial moment generating function that would almost close the quadratic gap between our finite-sample bound and the asymptotic moment generating function bound from Wilks' theorem (which does not hold for finite samples).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.