Sampling Algorithms and Coresets for Lp Regression
Abstract
The Lp regression problem takes as input a matrix A ∈ n × d, a vector b ∈ n, and a number p ∈ [1,∞), and it returns as output a number Z and a vector xopt ∈ d such that Z = x ∈ d ||Ax -b||p = ||Axopt-b||p. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samples r1 = O(36p d\p/2+1, p\+1) rows of A and the corresponding elements of b, and then it solves the Lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample r1/ε2 constraints, and then it solves the Lp regression problem on the new sample; we prove this is a (1+ε)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of Lp regression, namely p = 1,2. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.