A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries
Abstract
We consider the scenario of n sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the k currently lowest observed values or a value with rank in [(1-)k,(1+)k] with probability (1-δ). We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are m updates to a node's value in between queries. The result is composed of two main parts, which each may be of independent interest: (1) Protocols which answer Top-k and k-Select queries. These protocols are memoryless in the sense that they gather all information at the time of the request. (2) A dynamic data structure which tracks for every k an element close to k. We describe how to combine the two parts to receive a protocol answering the stated queries over multiple time steps. Overall, for Top-k queries we use O(k + m + n) and for k-Select queries O(12 1δ + m + 2 n) messages in expectation. These results are shown to be asymptotically tight if m is not too small.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.