When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

Abstract

Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8× inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy H(D) and expected depth E[D] rather than maximum depth K, with sample complexity O((E[D] · d + H(D))/ε2). (2) Explicit Constructive Constants: Our analysis yields the leading coefficient 2 2 ≈ 1.177 with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks strictly outperform fixed-depth counterparts. (4) Extension to Approximate Label Independence: We relax the label-independence assumption to ε-approximate policies, broadening applicability to learned routing. (5) Comprehensive Validation: Experiments across 6 architectures on 7 benchmarks demonstrate tightness ratios of 1.52-3.87× (all p < 0.001) versus >100× for classical bounds. Bound-guided threshold selection matches validation-tuned performance within 0.1-0.3%.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…