High-Dimensional Geometric Streaming for Nearly Low Rank Data
Abstract
We study streaming algorithms for the p subspace approximation problem. Given points a1, …, an as an insertion-only stream and a rank parameter k, the p subspace approximation problem is to find a k-dimensional subspace V such that (Σi=1n d(ai, V)p)1/p is minimized, where d(a, V) denotes the Euclidean distance between a and V defined as v ∈ V\|a - v\|∞. When p = ∞, we need to find a subspace V that minimizes i d(ai, V). For ∞ subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a poly(k, n) approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For p subspace approximation, we show that suitably scaling the points and then using our ∞ coreset construction, we can compute a poly(k, n) approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.