On the Provable Generalization of Recurrent Neural Networks
Abstract
Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence x=(X1,X2,...,XL), previous works study to learn functions that are summation of f(βTlXl) and require normalized conditions that ||Xl||≤ε with some very small ε depending on the complexity of f. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length L. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form f(βT[Xl1,...,XlN]), which do not belong to the "additive" concept class, i,e., the summation of function f(Xl). And we show that when either N or l0=(l1,..,lN)-(l1,..,lN) is small, f(βT[Xl1,...,XlN]) will be learnable with the number iterations and samples scaling almost-polynomially in the input length L.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.