Perfect Lp Sampling in a Data Stream

Abstract

In this paper, we resolve the one-pass space complexity of Lp sampling for p ∈ (0,2). Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector f ∈ Rn, a perfect Lp sampler must output an index i with probability |fi|p/\|f\|pp, and is allowed to fail with some probability δ. So far, for p > 0 no algorithm has been shown to solve the problem exactly using poly( n)-bits of space. In 2010, Monemizadeh and Woodruff introduced an approximate Lp sampler, which outputs i with probability (1 )|fi|p /\|f\|pp, using space polynomial in -1 and (n). The space complexity was later reduced by Jowhari, Saglam, and Tardos to roughly O(-p 2 n δ-1) for p ∈ (0,2), which tightly matches the (2 n δ-1) lower bound in terms of n and δ, but is loose in terms of . Given these nearly tight bounds, it is perhaps surprising that no lower bound exists in terms of ---not even a bound of (-1) is known. In this paper, we explain this phenomenon by demonstrating the existence of an O(2 n δ-1)-bit perfect Lp sampler for p ∈ (0,2). This shows that need not factor into the space of an Lp sampler, which closes the complexity of the problem for this range of p. For p=2, our bound is O(3 n δ-1)-bits, which matches the prior best known upper bound in terms of n,δ, but has no dependence on . For p<2, our bound holds in the random oracle model, matching the lower bounds in that model. Moreover, we show that our algorithm can be derandomized with only a O(( n)2) blow-up in the space (and no blow-up for p=2). Our derandomization technique is general, and can be used to derandomize a large class of linear sketches.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…