k-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation
Abstract
Given an integer k≥1 and a set P of n points in d, the classic k-PCA (Principle Component Analysis) approximates the affine k-subspace mean of P, which is the k-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances (2,2-norm) over the points of P, i.e., the mean of these distances. The k-subspace median is the subspace that minimizes its sum of (non-squared) Euclidean distances (2,1-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the z,z (non-mixed) norms, it is non-convex for k<d-1. We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in k. More precisely, the multiplicative approximation factor is d, and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as 2,z norm of distances for z ∈ 1,2, e.g., z=∞, and handling outliers/sparsity. Open code and experimental results on real-world datasets are also provided.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.