Fixed width treelike neural networks capacity analysis -- generic activations
Abstract
We consider the capacity of treelike committee machines (TCM) neural networks. Relying on Random Duality Theory (RDT), Stojnictcmspnncaprdt23 recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called partially lifted RDT (pl RDT) was then presented in Stojnictcmspnncapliftedrdt23. Both lines of work focused on the networks with the most typical, sign, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23 are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard linear activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the quadratic and rectified linear unit (ReLU) ones. In more concrete terms, for each of these activations, we obtain both the RDT and pl RDT based memory capacities upper bound characterization for any given (even) number of the hidden layer neurons, d. In the process, we also uncover the following two, rather remarkable, facts: 1) contrary to the common wisdom, both sets of results show that the bounding capacity decreases for large d (the width of the hidden layer) while converging to a constant value; and 2) the maximum bounding capacity is achieved for the networks with precisely two hidden layer neurons! Moreover, the large d converging values are observed to be in excellent agrement with the statistical physics replica theory based predictions.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.