Pack only the essentials: Adaptive dictionary learning for kernel ridge regression

Michal Valko

Pack only the essentials: Adaptive dictionary learning for kernel ridge regression

Abstract

One of the major limits of kernel ridge regression (KRR) is that storing and manipulating the kernel matrix Kn for n samples requires O(n2) space, which rapidly becomes unfeasible for large n. Nystrom approximations reduce the space complexity to O(nm) by sampling m columns from Kn. Uniform sampling preserves KRR accuracy (up to epsilon) only when m is proportional to the maximum degree of freedom of Kn, which may require O(n) columns for datasets with high coherence. Sampling columns according to their ridge leverage scores (RLS) gives accurate Nystrom approximations with m proportional to the effective dimension, but computing exact RLS also requires O(n2) space. (Calandriello et al. 2016) propose INK-Estimate, an algorithm that processes the dataset incrementally and updates RLS, effective dimension, and Nystrom approximations on-the-fly. Its space complexity scales with the effective dimension but introduces a dependency on the largest eigenvalue of Kn, which in the worst case is O(n). In this paper we introduce SQUEAK, a new algorithm that builds on INK-Estimate but uses unnormalized RLS. As a consequence, the algorithm is simpler, does not need to estimate the effective dimension for normalization, and achieves a space complexity that is only a constant factor worse than exact RLS sampling.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…