Heavy hitters via cluster-preserving clustering

Mikkel Thorup

Heavy hitters via cluster-preserving clustering

Abstract

In turnstile p -heavy hitters, one maintains a high-dimensional x∈Rn subject to update(i,) causing xi← xi + , where i∈[n], ∈R. Upon receiving a query, the goal is to report a small list L⊂[n], |L| = O(1/p), containing every "heavy hitter" i∈[n] with |xi| \|x1/p\|p, where xk denotes the vector obtained by zeroing out the largest k entries of x in magnitude. For any p∈(0,2] the CountSketch solves p heavy hitters using O(-p n) words of space with O( n) update time, O(n n) query time to output L, and whose output after any query is correct with high probability (whp) 1 - 1/poly(n). Unfortunately the query time is very slow. To remedy this, the work [CM05] proposed for p=1 in the strict turnstile model, a whp correct algorithm achieving suboptimal space O(-12 n), worse update time O(2 n), but much better query time O(-1poly( n)). We show this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, ExpanderSketch, which in the most general turnstile model achieves optimal O(-p n) space, O( n) update time, and fast O(-ppoly( n)) query time, and whp correctness. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We then develop a "cluster-preserving clustering" algorithm, partitioning the graph into clusters without destroying any original cluster.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…