Dimensionality Reduction for Sum-of-Distances Metric

Abstract

We give a dimensionality reduction procedure to approximate the sum of distances of a given set of n points in Rd to any "shape" that lies in a k-dimensional subspace. Here, by "shape" we mean any set of points in Rd. Our algorithm takes an input in the form of an n × d matrix A, where each row of A denotes a data point, and outputs a subspace P of dimension O(k3/ε6) such that the projections of each of the n points onto the subspace P and the distances of each of the points to the subspace P are sufficient to obtain an ε-approximation to the sum of distances to any arbitrary shape that lies in a k-dimensional subspace of Rd. These include important problems such as k-median, k-subspace approximation, and (j,l) subspace clustering with j · l ≤ k. Dimensionality reduction reduces the data storage requirement to (n+d)k3/ε6 from nnz(A). Here nnz(A) could potentially be as large as nd. Our algorithm runs in time nnz(A)/ε2 + (n+d)poly(k/ε), up to logarithmic factors. For dense matrices, where nnz(A) ≈ nd, we give a faster algorithm, that runs in time nd + (n+d)poly(k/ε) up to logarithmic factors. Our dimensionality reduction algorithm can also be used to obtain poly(k/ε) size coresets for k-median and (k,1)-subspace approximation problems in polynomial time.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…