On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

Abstract

We study the behavior of stochastic gradient descent applied to \|Ax -b \|22 → for invertible A ∈ Rn × n. We show that there is an explicit constant cA depending (mildly) on A such that E ~\| Axk+1-b\|22 ≤ (1 + cA\|A\|F2) \|A xk -b \|22 - 2\|A\|F2 \|AT A (xk - x)\|22. This is a curious inequality: the last term has one more matrix applied to the residual uk - u than the remaining terms: if xk - x is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…