Curse of Dimensionality in Pivot-based Indexes

Abstract

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples Xd picked in i.i.d. fashion from metric spaces d. We allow the size of the dataset n=nd to be such that d, the ``dimension'', is superlogarithmic but subpolynomial in n. The number of pivots is allowed to grow as o(n/d). We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces d in the sense of concentration of measure phenomenon is O(d), then the performance of similarity search pivot-based indexes is asymptotically linear in n.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…