Learning Halfspaces and Neural Networks with Random Initialization

Abstract

We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are L-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk ε>0. The time complexity is polynomial in the input dimension d and the sample size n, but exponential in the quantity (L/ε2)(L/ε). These algorithms run multiple rounds of random initialization followed by arbitrary optimization steps. We further show that if the data is separable by some neural network with constant margin γ>0, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin (γ). As a consequence, the algorithm achieves arbitrary generalization error ε>0 with poly(d,1/ε) sample and time complexity. We establish the same learnability result when the labels are randomly flipped with probability η<1/2.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…