Frequency Estimation with One-Sided Error

Abstract

Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream S of elements from some universe U=\1 … n\, the goal is to compute, in a single pass, a short sketch of S so that for any element i ∈ U, one can estimate the number xi of times i occurs in S based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator x produced by Count-Min, using O(1/ · n) dimensions, guarantees that \|x-x\|∞ \|x\|1 with high probability, and x x holds deterministically. Also, Count-Min works under the assumption that x 0. On the other hand, Count-Sketch, using O(1/2 · n) dimensions, guarantees that \|x-x\|∞ \|x\|2 with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the 2 norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., x x should hold always.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…