Minimax Lower Bounds for Ridge Combinations Including Neural Nets

Abstract

Estimation of functions of d variables is considered using ridge combinations of the form Σk=1m c1,k φ(Σj=1d c0,j,kxj-bk) where the activation function φ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [-1,1]d , the minimax mean square error is examined for functions in the closure of the 1 hull of ridge functions with activation φ . It is shown to be of order d/n to a fractional power (when d is of smaller order than n ), and to be of order ( d)/n to a fractional power (when d is of larger order than n ). Dependence on constraints v0 and v1 on the 1 norms of inner parameter c0 and outer parameter c1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…