Estimation of the sub-Gaussian parameter
Abstract
The sub-Gaussian parameter (also called the variance proxy) of a mean-zero random variable X is defined as ξ2* = λ∈ R L(λ) where L(λ) = 2λ2 E eλX is a weighted cumulant generating function. Despite the ubiquity of sub-Gaussian random variables, the estimation of ξ2* has received little attention and is not yet well understood. In this work, we study a natural estimator of ξ2* based on constrained maximization of the empirical analogue of L. We prove that the estimator is consistent bound the rates of convergence under assumptions on L: if L has an maximizer, then our bound is Op(n-1/2 + ) for any > 0; if the argmax of L is also bounded, then the bound improves to Op(n-1/2). We show that our assumptions on L are necessary by proving that the minimax risk over all sub-Gaussian distributions is Ω(1); imposing increasingly strong assumptions on the tail growth of L yields a continuum of classes whose minimax lower bound interpolates between Ω(1/ n) and Ω(1). Root-n rate is possible if we restrict to a subclass of distributions where L attains its supremum in a bounded region, in which case our estimator is minimax optimal. If the underlying distribution is not sub-Gaussian, we show that our estimator goes to infinity with a divergence rate controlled by the tail of the distribution. Finally, we apply our estimator in a Gene Ontology (GO) enrichment study to construct p-values for a large-scale permutation test, showing that it can serve as a reliable alternative to the peaks-over-threshold approach, particularly in regimes where the peaks-over-threshold method is of uncertain validity.