Practical Data-Dependent Metric Compression with Provable Guarantees

Abstract

We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d (d B/ε) + n) bits per point from which one can approximate the distances up to a factor of 1 ε. Our algorithm almost matches the recent bound of~indyk2017near while being much simpler. We compare our algorithm to Product Quantization (PQ)~jegou2011product, a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in jegou2011product), MNIST~lecun1998mnist, New York City taxi time series~guha2016robust and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned parameters, our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…