Greedy bi-criteria approximations for k-medians and k-means

Abstract

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of k-medians and k-means, the key results are as follows. When the method considers all data points as candidate centers, then selecting O(k(1/)) centers achieves cost at most 2+ times the optimal cost with k centers. Alternatively, the same guarantees hold if each round samples O(k/5) candidate centers proportionally to their cluster cost (as with kmeans++, but holding centers fixed). In the case of k-means, considering an augmented set of n1/ candidate centers gives 1+ approximation with O(k(1/)) centers, the entire algorithm taking O(dk(1/)n1+1/) time, where n is the number of data points in Rd. In the case of Euclidean k-medians, generating a candidate set via nO(1/2) executions of stochastic gradient descent with adaptively determined constraint sets will once again give approximation 1+ with O(k(1/)) centers in dk(1/)nO(1/2) time. Ancillary results include: guarantees for cluster costs based on powers of metrics; a brief, favorable empirical evaluation against kmeans++; data-dependent bounds allowing 1+ in the first two bullets above, for example with k-medians over finite metric spaces.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…