Unbiased Insights: Optimal Streaming Algorithms for p Sampling, the Forget Model, and Beyond

Abstract

We study p sampling and frequency moment estimation in a single-pass insertion-only data stream. For p ∈ (0,2), we present a nearly space-optimal approximate p sampler that uses O( n (1/δ)) bits of space and for p = 2, we present a sampler with space complexity O(2 n (1/δ)). This space complexity is optimal for p ∈ (0, 2) and improves upon prior work by a n factor. We further extend our construction to a continuous p sampler, which outputs a valid sample index at every point during the stream. Leveraging these samplers, we design nearly unbiased estimators for Fp in data streams that include forget operations, which reset individual element frequencies and introduce significant non-linear challenges. As a result, we obtain near-optimal algorithms for estimating Fp for all p in this model, originally proposed by Pavan, Chakraborty, Vinodchandran, and Meel [PODS'24], resolving all three open problems they posed. Furthermore, we generalize this model to what we call the suffix-prefix deletion model, and extend our techniques to estimate entropy as a corollary of our moment estimation algorithms. Finally, we show how to handle arbitrary coordinate-wise functions during the stream, for any g ∈ G, where G includes all (linear or non-linear) contraction functions.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…