Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

Abstract

Given a dataset D, we are interested in computing the mean of a subset of D which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget N. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate O(N1-1 + N2-1 + N11/2N2-3/2), where N=K · N1+N2 for some integer constant K and K · N1 and N2 represent the number of samples used in Stage 1 and Stage 2 of ABae respectively. Hence, if a constant fraction of the total sample budget N is allocated to each stage, we will achieve a mean squared error of O(N-1) which matches the rate of mean squared error of the optimal stratified sampling algorithm given a priori knowledge of the predicate positive rate and standard deviation per stratum.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…