Multi-Objective Weighted Sampling
Abstract
Multi-objective samples are powerful and versatile summaries of large data sets. For a set of keys x∈ X and associated values fx ≥ 0, a weighted sample taken with respect to f allows us to approximate segment-sum statistics Sum(f;H) = sumx∈ H fx, for any subset H of the keys, with statistically-guaranteed quality that depends on sample size and the relative weight of H. When estimating Sum(g;H) for g=f, however, quality guarantees are lost. A multi-objective sample with respect to a set of functions F provides for each f∈ F the same statistical guarantees as a dedicated weighted sample while minimizing the summary size. We analyze properties of multi-objective samples and present sampling schemes and meta-algortithms for estimation and optimization while showcasing two important application domains. The first are key-value data sets, where different functions f∈ F applied to the values correspond to different statistics such as moments, thresholds, capping, and sum. A multi-objective sample allows us to approximate all statistics in F. The second is metric spaces, where keys are points, and each f∈ F is defined by a set of points C with fx being the service cost of x by C, and Sum(f;X) models centrality or clustering cost of C. A multi-objective sample allows us to estimate costs for each f∈ F. In these domains, multi-objective samples are often of small size, are efficiently to construct, and enable scalable estimation and optimization. We aim here to facilitate further applications of this powerful technique.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.