Generic Coreset for Scalable Learning of Monotonic Kernels: Logistic Regression, Sigmoid and more
Abstract
Coreset (or core-set) is a small weighted subset Q of an input set P with respect to a given monotonic function f:R that provably approximates its fitting loss Σp∈ Pf(p· x) to any given x∈Rd. Using Q we can obtain approximation of x* that minimizes this loss, by running existing optimization algorithms on Q. In this work we provide: (i) A lower bound which proves that there are sets with no coresets smaller than n=|P| for general monotonic loss functions. (ii) A proof that, under a natural assumption that holds e.g. for logistic regression and the sigmoid activation functions, a small coreset exists for any input P. (iii) A generic coreset construction algorithm that computes such a small coreset Q in O(nd+n n) time, and (iv) Experimental results which demonstrate that our coresets are effective and are much smaller in practice than predicted in theory.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.