Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units
Abstract
This paper presents a general framework for norm-based capacity control for Lp,q weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an Lp,q normalization where q p*, and 1/p+1/p*=1, we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of Lp,q weight normalized deep neural networks. In particular, for an L1,∞ weight normalized network, the approximation error can be controlled by the L1 norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.