Memory-Sample Tradeoffs for Linear Regression with Small Error

Abstract

We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a1,b1), (a2,b2)…, with ai drawn independently from a d-dimensional isotropic Gaussian, and where bi = ai, x + ηi, for a fixed x ∈ Rd with \|x\|2 = 1 and with independent noise ηi drawn uniformly from the interval [-2-d/5,2-d/5]. We show that any algorithm with at most d2/4 bits of memory requires at least (d 1ε) samples to approximate x to 2 error ε with probability of success at least 2/3, for ε sufficiently small as a function of d. In contrast, for such ε, x can be recovered to error ε with probability 1-o(1) with memory O(d2 (1/ε)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…