Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models
Abstract
We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial f*(x)=h(Ux), with U∈Rr× d and r d, from finitely many data/label pairs. Importantly, the target function depends on input x only through the projection onto an unknown r-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top r-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function f* has degree p*, it is known that n dp* samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree p component of f* already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime n dp+δ for any δ∈(0,1). Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.