Composition of nested embeddings with an application to outlier removal
Abstract
We study the design of embeddings into Euclidean space with outliers. Given a metric space (X,d) and an integer k, the goal is to embed all but k points in X (called the ``outliers") into 2 with the smallest possible distortion c. Finding the optimal distortion c for a given outlier set size k, or alternately the smallest k for a given target distortion c are both NP-hard problems. In fact, it is UGC-hard to approximate k to within a factor smaller than 2 even when the metric sans outliers is isometrically embeddable into 2. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an O(2 k) factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing Lipschitz extensions of embeddings into Banach spaces (such as p spaces). We consider a stronger version of Lipschitz extension that we call a nested composition of embeddings: given a low distortion embedding of a subset S of the metric space X, our goal is to extend this embedding to all of X such that the distortion over S is preserved, whereas the distortion over the remaining pairs of points in X is bounded by a function of the size of X S. Prior work on Lipschitz extension considers settings where the size of X is potentially much larger than that of S and the expansion bounds depend on |S|. In our setting, the set S is nearly all of X and the remaining set X S, a.k.a. the outliers, is small. We achieve an expansion bound that is logarithmic in |X S|.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.