A bi-criteria approximation algorithm for k Means
Abstract
We consider the classical k-means clustering problem in the setting bi-criteria approximation, in which an algoithm is allowed to output β k > k clusters, and must produce a clustering with cost at most α times the to the cost of the optimal set of k clusters. We argue that this approach is natural in many settings, for which the exact number of clusters is a priori unknown, or unimportant up to a constant factor. We give new bi-criteria approximation algorithms, based on linear programming and local search, respectively, which attain a guarantee α(β) depending on the number β k of clusters that may be opened. Our gurantee α(β) is always at most 9 + ε and improves rapidly with β (for example: α(2)<2.59, and α(3) < 1.4). Moreover, our algorithms have only polynomial dependence on the dimension of the input data, and so are applicable in high-dimensional settings.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.