Parallel Hierarchical Agglomerative Clustering in Low Dimensions

Abstract

Hierarchical Agglomerative Clustering (HAC) is an extensively studied and widely used method for hierarchical clustering in Rk based on repeatedly merging the closest pair of clusters according to an input linkage function d. Highly parallel (i.e., NC) algorithms are known for (1+ε)-approximate HAC (where near-minimum rather than minimum pairs are merged) for certain linkage functions that monotonically increase as merges are performed. However, no such algorithms are known for many important but non-monotone linkage functions such as centroid and Ward's linkage. In this work, we show that a general class of non-monotone linkage functions -- which include centroid and Ward's distance -- admit efficient NC algorithms for (1+ε)-approximate HAC in low dimensions. Our algorithms are based on a structural result which may be of independent interest: the height of the hierarchy resulting from any constant-approximate HAC on n points for this class of linkage functions is at most poly( n) as long as k = O( n / n). Complementing our upper bounds, we show that NC algorithms for HAC with these linkage functions in arbitrary dimensions are unlikely to exist by showing that HAC is CC-hard when d is centroid distance and k = n.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…