A Framework for Building Data Structures from Communication Protocols
Abstract
We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems (∃ \;? i∈[n], f(xi,y)=1) through communication models in which f(x,y) admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of f(x,y) under product distributions. We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of n points in dimension d, and the number of 's in the query is at most w = c n \;( d), the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time t ≈ 2w = nc, which is nontrivial only when c<1. By contrast, our framework produces a data structure with query time n1-1/(c 2 c) and space close to linear. To achieve this, we develop a one-sided ε-error communication protocol for Set-Disjointness under product distributions with (d(1/ε)) complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of w-Sparse Set-Disjointness with ε-error under product distributions is O(w (1/ε)), independent of the ambient dimension d, which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.