Exponential Approximation Rates and Parameter Efficiency of Learnable Bernstein Activations
Abstract
The choice of activation function fundamentally shapes the representational capacity and parameter efficiency of deep neural networks, yet most widely used activations lack rigorous theoretical guarantees on these properties. We provide a theoretical analysis of DeepBern-Nets (DBNs) -- networks employing learnable Bernstein polynomial activations -- showing that their approximation error decays with the network depth L and the polynomial order n with a rate of O(n-L), exponentially faster than the polynomial rate of ReLU architectures while remaining fully differentiable. We validate these predictions through 1,344 experiments on large scientific datasets (HIGGS and SUSY), comparing DBNs against ReLU, Leaky ReLU, SELU, and GeLU. DBNs achieve over 70\% parameter reduction across the majority of architectures -- reaching 99.9\% at scale -- converge to ReLU's final loss in as few as 26\% of the training epochs, and attain up to 45\% lower final loss. These advantages hold over all tested activations, confirming that DBN's gains stem from the learnable polynomial structure rather than mere smoothness.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.