Revisiting Frequency Moment Estimation in Random Order Streams
Abstract
We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments Fp for 0 < p < 2 of an underlying n-dimensional vector presented as a sequence of additive updates in a stream. It is well-known that using p-stable distributions one can approximate any of these moments up to a multiplicative (1+ε)-factor using O(ε-2 n) bits of space, and this space bound is optimal up to a constant factor in the turnstile streaming model. We show that surprisingly, if one instead considers the popular random-order model of insertion-only streams, in which the updates to the underlying vector arrive in a random order, then one can beat this space bound and achieve O(ε-2 + n) bits of space, where the O hides poly((1/ε) + n) factors. If ε-2 ≈ n, this represents a roughly quadratic improvement in the space achievable in turnstile streams. Our algorithm is in fact deterministic, and we show our space bound is optimal up to poly((1/ε) + n) factors for deterministic algorithms in the random order model. We also obtain a similar improvement in space for p = 2 whenever F2 n· F1.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.