The SpaceSaving Family of Algorithms for Data Streams with Bounded Deletions

Abstract

In this paper, we present an advanced analysis of near optimal algorithms that use limited space to solve the frequency estimation, heavy hitters, frequent items, and top-k approximation in the bounded deletion model. We define the family of SpaceSaving algorithms and explain why the original SpaceSaving algorithm only works when insertions and deletions are not interleaved. Next, we propose the new Double SpaceSaving, Unbiased Double SpaceSaving, and Integrated SpaceSaving and prove their correctness. The three proposed algorithms represent different trade-offs, in which Double SpaceSaving can be extended to provide unbiased estimations while Integrated SpaceSaving uses less space. Since data streams are often skewed, we present an improved analysis of these algorithms and show that errors do not depend on the hot items. We also demonstrate how to achieve relative error guarantees under mild assumptions. Moreover, we establish that the important mergeability property is satisfied by all three algorithms, which is essential for running the algorithms in distributed settings.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…