Kernel Thinning

Lester Mackey

Kernel Thinning

Abstract

We introduce kernel thinning, a new procedure for compressing a distribution P more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel k and O(n2) time, kernel thinning compresses an n-point approximation to P into a n-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. The maximum discrepancy in integration error is Od(n-1/2 n) in probability for compactly supported P and Od(n-12 ( n)(d+1)/2 n) for sub-exponential P on Rd. In contrast, an equal-sized i.i.d. sample from P suffers (n-1/4) integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform P on [0,1]d but apply to general distributions on Rd and a wide range of common kernels. Moreover, the same construction delivers near-optimal L∞ coresets in O(n2) time. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions d=2 through 100.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…