Spectral Clustering in Birthday Paradox Time
Abstract
Given a vertex in a (k, , ε)-clusterable graph, i.e. a graph whose vertex set can be partitioned into a disjoint union of -expanders of size ≈ n/k with outer conductance bounded by ε, can one quickly tell which cluster it belongs to? This question goes back to the expansion testing problem of Goldreich and Ron'11. For k=2 a sample of ≈ n1/2+O(ε/2) logarithmic length walks from a given vertex approximately determines its cluster membership by the birthday paradox: two vertices whose random walk samples are `close' are likely in the same cluster. The study of the general case k>2 was initiated by Czumaj, Peng and Sohler [STOC'15], and the works of Chiplunkar et al. [FOCS'18], Gluch et al. [SODA'21] showed that ≈ poly(k)· n1/2+O(ε/2) random walk samples suffice for general k. This matches the k=2 result up to polynomial factors in k, but creates a conceptual inconsistency: if the birthday paradox is the guiding phenomenon, then the query complexity should decrease with the number of clusters k! Since clusters have size ≈ n/k, we expect to need ≈ (n/k)1/2+O(ε/2) random walk samples, which decreases with k. We design a novel representation of vertices in a (k, , ε)-clusterable graph by a mixture of logarithmic length walks. This representation uses the optimal ≈ (n/k)1/2+O(ε/2) walks per vertex, and allows for a fast nearest neighbor search: given k vertices representing the clusters, we can find the cluster of a given query vertex x using nearly linear time in the representation size of x. This gives a clustering oracle with query time ≈ (n/k)1/2+O(ε/2) and space complexity k· (n/k)1/2+O(ε/2), matching the birthday paradox bound.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.