A Distribution Testing Approach to Clustering Distributions

Abstract

We study the following distribution clustering problem: Given a hidden partition of k distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are -far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size n, number of distributions k, size r of one of the clusters, and distance . In particular, we achieve tightness with respect to (n,k,r,) (up to an O( k) factor) for all regimes.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…