Depth Separations in Neural Networks: Separating the Dimension from the Accuracy

Abstract

We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs), when approximating a O(1)-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in safran2019depth, and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale polynomially with the input dimension, whereas in our result these parameters are fixed to be constants independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…