Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

Ilya Volnyansky

Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

Abstract

In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces (d) from which we pick datasets Xd in an i.i.d. fashion. We call the subscript d the dimension of the space d (e.g. for Rd the dimension is just the usual one) and we allow the size of the dataset n=nd to be such that d is superlogarithmic but subpolynomial in n. We study the asymptotic performance of pivot-based indexing schemes where the number of pivots is o(n/d). We pick the relatively simple cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the spaces d exhibit the (fairly common) concentration of measure phenomenon the performance of similarity search using such indexes is asymptotically linear in n. That is for large enough d the difference between using such an index and performing a search without an index at all is negligeable. Thus we confirm the curse of dimensionality in this setting.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…