Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

Abstract

We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations. Concretely, we show that given O(dr2) random linear measurements of a rank r positive semidefinite matrix X, we can recover X by parameterizing it by UU with U∈ Rd× d and minimizing the squared loss, even if r d. We prove that starting from a small initialization, gradient descent recovers X in O(r) iterations approximately. The results solve the conjecture of Gunasekar et al.'17 under the restricted isometry property. The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…