Testing noisy linear functions for sparsity

Rocco A. Servedio

Testing noisy linear functions for sparsity

Abstract

We consider the following basic inference problem: there is an unknown high-dimensional vector w ∈ Rn, and an algorithm is given access to labeled pairs (x,y) where x ∈ Rn is a measurement and y = w · x + noise. What is the complexity of deciding whether the target vector w is (approximately) k-sparse? The recovery analogue of this problem --- given the promise that w is sparse, find or approximate the vector w --- is the famous sparse recovery problem, with a rich body of work in signal processing, statistics, and computer science. We study the decision version of this problem (i.e.~deciding whether the unknown w is k-sparse) from the vantage point of property testing. Our focus is on answering the following high-level question: when is it possible to efficiently test whether the unknown target vector w is sparse versus far-from-sparse using a number of samples which is completely independent of the dimension n? We consider the natural setting in which x is drawn from a i.i.d.~product distribution D over Rn and the noise process is independent of the input x. As our main result, we give a general algorithm which solves the above-described testing problem using a number of samples which is completely independent of the ambient dimension n, as long as D is not a Gaussian. In fact, our algorithm is fully noise tolerant, in the sense that for an arbitrary w, it approximately computes the distance of w to the closest k-sparse vector. To complement this algorithmic result, we show that weakening any of our condition makes it information-theoretically impossible for any algorithm to solve the testing problem with fewer than essentially n samples.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…