Is Spurious Correlation Removal Always Learnable?
Abstract
Invariant learning can fail even when the invariant structure is statistically identifiable. We show a conditional computational barrier: under a black-box samplable supervised sparse recovery primitive motivated by average-case sparse-recovery reductions, there exist samplable multi-environment instances with a one-dimensional predictive invariant subspace (k=1) that are learnable with polynomial samples by exhaustive search, while any polynomial-time constant-accuracy recovery algorithm would contradict the primitive. We further quantify environment diversity by a separation parameter γ, which controls identifiability and the curvature of invariance objectives. Under sufficient diversity and local Gaussian regularity, the minimax risk is E[(V,Vinv)2]=Θ(k(d-k)/(n|E|)), and under label-induced shifts a phase transition occurs at n* k(d-k)/(|E|γ2) with refined estimation error scaling proportional to 1/γ2. Synthetic and real datasets illustrate the predicted gaps and transitions and motivate simple diversity diagnostics.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.