Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Abstract

We consider the problem of identifying, from statistics, a distribution of discrete random variables X1,…,Xn that is a mixture of k product distributions. The best previous sample complexity for n ∈ O(k) was (1/ζ)O(k2 k) (under a mild separation assumption parameterized by ζ). The best known lower bound was ((k)). It is known that n≥ 2k-1 is necessary and sufficient for identification. We show, for any n≥ 2k-1, how to achieve sample complexity and run-time complexity (1/ζ)O(k). We also extend the known lower bound of e(k) to match our upper bound across a broad range of ζ. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…