Consistent procedures for cluster tree estimation and pruning

Abstract

For a density f on Rd, a high-density cluster is any connected component of \x: f(x) ≥ λ\, for some λ > 0. The set of all high-density clusters forms a hierarchy called the cluster tree of f. We present two procedures for estimating the cluster tree given samples from f. The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the k-nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…