Optimal Coreset for Gaussian Kernel Density Estimation
Abstract
Given a point set P⊂ Rd, the kernel density estimate of P is defined as \[ GP(x) = 1|P|Σp∈ Pe- x-p 2 \] for any x∈Rd. We study how to construct a small subset Q of P such that the kernel density estimate of P is approximated by the kernel density estimate of Q. This subset Q is called a coreset. The main technique in this work is constructing a 1 coloring on the point set P by discrepancy theory and we leverage Banaszczyk's Theorem. When d>1 is a constant, our construction gives a coreset of size O(1) as opposed to the best-known result of O(11). It is the first result to give a breakthrough on the barrier of factor even when d=2.
0