High Dimensional Gaussian and Bootstrap Approximations in Generalized Linear Models
Abstract
Generalized Linear Model (or GLM) extends the ordinary linear regression by linking the mean of the response variable to covariates through appropriate link functions. GLM is widely used in the analysis of datasets arising from diverse fields including medical sciences, clinical trials, population surveys and risk analysis. In this paper, we investigate the Gaussian and Bootstrap approximations of GLM under two separate high dimensional regimes: (I) when the dimension d grows slower than n and (II) when d grows exponentially with n. Under regime (I), we essentially show that the Gaussian approximation holds over the collection of Borel convex sets when d = o(n2/5) and over the collection of Euclidean balls when d = o(n1/2). We further devise two high dimensional Bootstrap methods which are valid over the collections of Borel convex sets and Euclidean balls under the same dimension growth rates. Then we move to regime (II) where we invoke sparsity to GLM through Lasso. We show that the high dimensional Gaussian approximation fails under regime (II). However, the Bootstrap approximations over convex sets and Euclidean balls are valid for the relevant part of the GLM estimator provided d = o(n2τ/3) and the number of non-zero regression parameters is o(n1/3- 4τ/3), when the Lasso penalty λn n1/2 + τ, for some τ ∈ (0, 1/4). Simulation studies confirm the strong finite-sample performance of our proposed Bootstrap methods under both regime (I) and (II). We also implement our methods on real datasets.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.