Streaming PTAS for Constrained k-Means
Abstract
We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-k-means problem defined as -- for a (unknown) partition X1, ..., Xk of the dataset X ⊂eq Rd, find a list of k-center sets (each element in the list is a set of k centers) such that at least one of k-center sets \c1, ..., ck\ in the list gives an (1+)-approximation with respect to the cost function permutation π [ Σi=1k Σx ∈ Xi ||x - cπ(i)||2 ]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to PTAS for various versions of the latter. Following are the consequences of our generalisations: - Streaming algorithm: Our D2-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-k-means problem. This can be converted to a 4-pass, logspace streaming PTAS for various constrained versions of the k-means problem. - Faster PTAS under stability: Our generalisation is also useful in k-means clustering scenarios where finding good centers becomes easy once good centers for a few "bad" clusters have been chosen. One such scenario is clustering under stability where the number of such bad clusters is a constant. Using the above idea, we significantly improve the running time of the known algorithm from O(dn3) (k n)poly(1β, 1) to O (dn3 kOβ (1β ) ).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.