Tuning-Free Efficient Estimation for Multi-Source Data via Covariance-Aware Shrinkage
Abstract
Modern statistical learning problems often involve multiple related data sets, where learning efficiency on a target set can be improved by utilizing related source sets, while heterogeneity among the source sets may introduce bias. Existing approaches are limited by suboptimal performance in multi-source settings, insufficient use of covariance information, or the computational burden of tuning procedures. We propose a tuning-free and covariance-aware shrinkage framework that constructs shrinkage directions using covariance information to improve efficiency. We establish finite-sample risk bounds that yield an explicit risk-improving interval for the shrinkage size, making the procedure fully data-driven and tuning-free. When multiple source sets are available, we further propose a novel sequential algorithm that shrinks the estimator toward the sources one at a time according to their estimated risk reduction. The proposed algorithm asymptotically attains the oracle risk under mild conditions and is guaranteed to improve over the single-step shrinkage method in the literature. The framework is further extended to general smooth \(M\)-estimation problems via a local quadratic approximation. Numerical studies show substantial gains over competing methods, especially when the source data sets are highly heterogeneous.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.