On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

Stefan Steinerberger

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

Abstract

We study the behavior of stochastic gradient descent applied to \|Ax -b \|22 → for invertible A ∈ Rn × n. We show that there is an explicit constant cA depending (mildly) on A such that E ~\| Axk+1-b\|22 ≤ (1 + cA\|A\|F2) \|A xk -b \|22 - 2\|A\|F2 \|AT A (xk - x)\|22. This is a curious inequality: the last term has one more matrix applied to the residual uk - u than the remaining terms: if xk - x is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…