Higher Order Generalization Error for First Order Discretization of Langevin Diffusion
Abstract
We propose a novel approach to analyze generalization error for discretizations of Langevin diffusion, such as the stochastic gradient Langevin dynamics (SGLD). For an ε tolerance of expected generalization error, it is known that a first order discretization can reach this target if we run (ε-1 (ε-1) ) iterations with (ε-1) samples. In this article, we show that with additional smoothness assumptions, even first order methods can achieve arbitrarily runtime complexity. More precisely, for each N>0, we provide a sufficient smoothness condition on the loss function such that a first order discretization can reach ε expected generalization error given ( ε-1/N (ε-1) ) iterations with (ε-1) samples.
0