Convergence rate of stochastic k-means
Abstract
We analyze online and mini-batch k-means variants. Both scale up the widely used Lloyd 's algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards local optima at O(1t) rate under general conditions. In addition, we show if the dataset is clusterable, with suitable initialization, mini-batch k-means converges to an optimal k-means solution with O(1t) convergence rate with high probability. The k-means objective is non-convex and non-differentiable: we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about k-means update.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.