Bayes meets Bernstein at the Meta Level: an Analysis of Fast Rates in Meta-Learning with PAC-Bayes

Abstract

Bernstein's condition is a key assumption that guarantees fast rates in machine learning. For example, the Gibbs algorithm with prior π has an excess risk in O(dπ/n), as opposed to the standard O(dπ/n), where n denotes the number of observations and dπ is a complexity parameter which depends on the prior π. In this paper, we examine the Gibbs algorithm in the context of meta-learning, i.e., when learning the prior π from T tasks (with n observations each) generated by a meta distribution. Our main result is that Bernstein's condition always holds at the meta level, regardless of its validity at the observation level. This implies that the additional cost to learn the Gibbs prior π, which will reduce the term dπ across tasks, is in O(1/T), instead of the expected O(1/T). We further illustrate how this result improves on standard rates in three different settings: discrete priors, Gaussian priors and mixture of Gaussians priors.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…