Convergence rate of stochastic k-means

Abstract

We analyze online BottouBengio and mini-batch Sculley k-means variants. Both scale up the widely used k-means algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that starting with any initial solution, they converge to a "local optimum" at rate O(1t) (in terms of the k-means objective) under general conditions. In addition, we show if the dataset is clusterable, when initialized with a simple and scalable seeding algorithm, mini-batch k-means converges to an optimal k-means solution at rate O(1t) with high probability. The k-means objective is non-convex and non-differentiable: we exploit ideas from recent work on stochastic gradient descent for non-convex problems ge:sgdtensor, balsubramani13 by providing a novel characterization of the trajectory of k-means algorithm on its solution space, and circumvent the non-differentiability problem via geometric insights about k-means update.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…