Smaller generalization error derived for a deep residual neural network compared to shallow networks

Abstract

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers z+1= z + ReΣk=1K b keiω k z+ ReΣk=1K c keiω' k· x. An optimal distribution for the frequencies (ω k,ω' k) of the random Fourier features eiω k z and eiω' k· x is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x). The generalization error turns out to be smaller than the estimate \| f\|2L1(Rd)/(KL) of the generalization error for random Fourier features with one hidden layer and the same total number of nodes KL, in the case the L∞-norm of f is much less than the L1-norm of its Fourier transform f. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…