Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

Abstract

If we pick n random points uniformly in [0,1]d and connect each point to its k-nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in [0,1]d it suffices to connect every point to cd,1 n points chosen randomly among its cd,2 n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with n n instead of n n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the k-nearest neighbors, one can often pick k' k random points out of the k-nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…