1-regularized Neural Networks are Improperly Learnable in Polynomial Time
Abstract
We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the 1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most ε worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/ε,(1/δ),F(k,L)), where F(k,L) is a function depending on (k,L) and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.