Convex SGD: Generalization Without Early Stopping

Abstract

We consider the generalization error associated with stochastic gradient descent on a smooth convex function over a compact set. We show the first bound on the generalization error that vanishes when the number of iterations T and the dataset size n go to zero at arbitrary rates; our bound scales as O(1/T + 1/n) with step-size αt = 1/t. In particular, strong convexity is not needed for stochastic gradient descent to generalize well.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…