EPTAS for k-means Clustering of Affine Subspaces
Abstract
We consider a generalization of the fundamental k-means clustering for data with incomplete or corrupted entries. When data objects are represented by points in Rd, a data point is said to be incomplete when some of its entries are missing or unspecified. An incomplete data point with at most unspecified entries corresponds to an axis-parallel affine subspace of dimension at most , called a -point. Thus we seek a partition of n input -points into k clusters minimizing the k-means objective. For =0, when all coordinates of each point are specified, this is the usual k-means clustering. We give an algorithm that finds an (1+ ε)-approximate solution in time f(k,ε, ) · n2 · d for some function f of k,ε, and only.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.