Convergence Analysis of the Dynamics of a Special Kind of Two-Layered Neural Networks with 1 and 2 Regularization
Abstract
In this paper, we made an extension to the convergence analysis of the dynamics of two-layered bias-free networks with one ReLU output. We took into consideration two popular regularization terms: the 1 and 2 norm of the parameter vector w, and added it to the square loss function with coefficient λ/2. We proved that when λ is small, the weight vector w converges to the optimal solution w (with respect to the new loss function) with probability ≥ (1-)(1-Ad)/2 under random initiations in a sphere centered at the origin, where is a small value and Ad is a constant. Numerical experiments including phase diagrams and repeated simulations verified our theory.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.