Approximation Preserving Coresets
Abstract
Clustering in a big data setting is an intensively studied problem, with coresets emerging as one of the important paradigms in this line of work. Given a cost function cost(P,S) mapping input points P and a solution S to an objective value, a coreset is a typically weighted sketch Ω⊂eq P such that cost(Ω,S)≈ cost(P,S). In practice, coreset sizes much smaller than those suggested by theoretical guarantees are often found to be sufficient. In this paper, we offer an explanation for this phenomenon. Smaller coreset sizes suffice if we only wish to preserve the costs of good solutions, i.e., solutions with low cost. We define and devise approximation-preserving coresets, which provide a weaker guarantee than strong coresets, which apply to all solutions, while providing stronger guarantees than weak coresets, which apply only to the optimum solution. We complement this result by showing that even a very small distortion in the approximation factor cannot admit coresets of this size.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.