Binary Embedding: Fundamental Limits and Fast Algorithm

Abstract

Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in Sp-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δ uniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m = (1δ2N) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3) we also provide an analytic result about embedding a general set of points K ⊂eq Sp-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…