Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Abstract

Within a mathematically rigorous model, we analyse the curse of dimensionality for deterministic exact similarity search in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain , equipped with a distance, , and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω∈, where the query points are subject to the same probability distribution μ as datapoints. Let F denote a class of all 1-Lipschitz functions on that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of the class of all sets \ω f(ω)≥ a\, a∈ is o(n1/4/2n). (In view of a 1995 result of Goldberg and Jerrum, even a stronger complexity assumption dO(1) is reasonable.) We deduce the (n1/4) lower bound on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (,X). In paricular, this bound is superpolynomial in d.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…