Best k-layer neural network approximations

Abstract

We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s1, …, sn ∈ Rp with corresponding responses t1,…,tn ∈ Rq, fitting a k-layer neural network θ : Rp Rq involves estimation of the weights θ ∈ Rm via an ERM: \[ ∈fθ ∈ Rm \; Σi=1n ti - θ(si) 22. \] We show that even for k = 2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank-r approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best k-layer neural networks approximations is more subtle. In addition, we show that for smooth activations σ(x)= 1/(1 + (-x)) and σ(x)=(x), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation σ(x)=(0,x), we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with d neurons in the hidden layer: it is the join locus of a line and the d-secant locus of a cone.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…