DiffRed: Dimensionality Reduction guided by stable rank
Abstract
In this work, we propose a novel dimensionality reduction technique, DiffRed, which first projects the data matrix, A, along first k1 principal components and the residual matrix A* (left after subtracting its k1-rank approximation) along k2 Gaussian random vectors. We evaluate M1, the distortion of mean-squared pair-wise distance, and Stress, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that DiffRed achieves a general upper bound of O(1-pk2) on Stress and O((1-p)k2*(A*)) on M1 where p is the fraction of variance explained by the first k1 principal components and (A*) is the stable rank of A*. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that DiffRed achieves near zero M1 and much lower values of Stress as compared to the well-known dimensionality reduction techniques. In particular, DiffRed can map a 6 million dimensional dataset to 10 dimensions with 54% lower Stress than PCA.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.