Approximation and Streaming Algorithms for Projective Clustering via Random Projections

Abstract

Let P be a set of n points in Rd. In the projective clustering problem, given k, q and norm ∈ [1,∞], we have to compute a set F of k q-dimensional flats such that (Σp∈ Pd(p, F))1/ is minimized; here d(p, F) represents the (Euclidean) distance of p to the closest flat in F. We let fkq(P,) denote the minimal value and interpret fkq(P,∞) to be r∈ Pd(r, F). When =1,2 and ∞ and q=0, the problem corresponds to the k-median, k-mean and the k-center clustering problems respectively. For every 0 < ε < 1, S⊂ P and 1, we show that the orthogonal projection of P onto a randomly chosen flat of dimension O(((q+1)2(1/ε)/ε3) n) will ε-approximate f1q(S,). This result combines the concepts of geometric coresets and subspace embeddings based on the Johnson-Lindenstrauss Lemma. As a consequence, an orthogonal projection of P to an O(((q+1)2 ((q+1)/ε)/ε3) n) dimensional randomly chosen subspace ε-approximates projective clusterings for every k and simultaneously. Note that the dimension of this subspace is independent of the number of clusters~k. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of n points, we show how to compute an ε-approximate projective clustering for every k and simultaneously using only O((n+d)((q+1)2 ((q+1)/ε))/ε3 n) space. Compared to standard streaming algorithms with (kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…