Sample-Efficient Linear Regression with Self-Selection Bias
Abstract
We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes m i.i.d. samples (x,z)=1m where z=i∈ [k]\xTwi+ηi,\, but the maximizing index i is unobserved. Here, the x are assumed to be N(0,In) and the noise distribution η D is centered and independent of x. We provide a novel and near optimally sample-efficient (in terms of k) algorithm to recover w1,…,wk∈ Rn up to additive 2-error with polynomial sample complexity O(n)· poly(k,1/) and significantly improved time complexity poly(n,k,1/)+O((k)/)O(k). When k=O(1), our algorithm runs in poly(n,1/) time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for k=2 and when it is known that D=N(0,Ik). Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of k than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small , and leads to improved algorithms for any by providing a warm start for existing local convergence methods.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.