Robust and Scalable Sure Screening of Fixed effects in Ultrahigh-dimensional Linear Mixed Models
Abstract
In modern applications of linear mixed models, the number of candidate fixed-effects covariates can grow exponentially with the sample size, while dependence induced by random effects and possible data contamination pose substantial challenges for existing variable screening methods. We propose a robust and computationally efficient sure screening procedure for identifying relevant fixed-effects covariates in ultrahigh-dimensional linear mixed models with known random effects. The proposed method leverages a proxy-based transformation to decouple dependence induced by random effects, enabling screening via marginal analysis in a transformed regression model. Robustness is achieved by constructing marginal utilities based on minimum density power divergence, yielding stability under data contamination and model misspecification without sacrificing scalability. The resulting procedure, termed DPD-SISP, is shown to retain all relevant covariates (sure screening property) with exponentially high probability under general conditions, allowing for non-Gaussian errors and nonpolynomial growth of dimensionality. In addition, DPD-SISP exhibits strong robustness properties supported by influence function and breakdown point analyses. The framework is further extended to incorporate prior information through conditional screening, mitigate correlation-induced masking via iterative refinement, and enable robust post-screening estimation of fixed effects. Extensive simulation studies demonstrate competitive performance of DPD-SISP under ideal settings and substantial gains in stability under data contamination. Its practical utility is illustrated through an application to high-dimensional longitudinal data from the ADNI2 study. The proposed framework thus provides a unified, robust, and scalable approach for variable screening in ultrahigh-dimensional linear mixed models.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.