Coresets for Data Discretization and Sine Wave Fitting
Abstract
In the monitoring problem, the input is an unbounded stream P=p1,p2·s of integers in [N]:=\1,·s,N\, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the n points received so far in P by a single frequency , e.g. c∈ Ccost(P,c)+λ(c), where cost(P,c)=Σi=1n 2(2πN pic), C⊂eq [N] is a feasible set of solutions, and λ is a given regularization function. For any approximation error >0, we prove that every set P of n integers has a weighted subset S⊂eq P (sometimes called core-set) of cardinality |S|∈ O((N)O(1)) that approximates cost(P,c) (for every c∈ [N]) up to a multiplicative factor of 1. Using known coreset techniques, this implies streaming algorithms using only O(((N)(n))O(1)) memory. Our results hold for a large family of functions. Experimental results and open source code are provided.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.