Detecting Low-Degree Truncation

Abstract

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution D over Rn and a collection of data points in Rn, distinguish between the two possibilities that (i) the data was drawn from D, versus (ii) the data was drawn from D|S, i.e. from D subject to truncation by an unknown truncation set S ⊂eq Rn. We study this problem in the setting where D is a high-dimensional i.i.d. product distribution and S is an unknown degree-d polynomial threshold function (one of the most well-studied types of Boolean-valued function over Rn). Our main results are an efficient algorithm when D is a hypercontractive distribution, and a matching lower bound: For any constant d, we give a polynomial-time algorithm which successfully distinguishes D from D|S using O(nd/2) samples (subject to mild technical conditions on D and S); Even for the simplest case of D being the uniform distribution over \+1, -1\n, we show that for any constant d, any distinguishing algorithm for degree-d polynomial threshold functions must use (nd/2) samples.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…