Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

Abstract

We prove that a single step of gradient decent over depth two network, with q hidden neurons, starting from orthogonal initialization, can memorize (dq4(d)) independent and randomly labeled Gaussians in Rd. The result is valid for a large class of activation functions, which includes the absolute value.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…