Effective Sample Size and Generalization Bounds for Temporal Networks
Abstract
Learning from time series is fundamentally different from learning from i.i.d.\ data: temporal dependence can make long sequences effectively information-poor, yet standard evaluation protocols conflate sequence length with statistical information. We propose a dependence-aware evaluation methodology that controls for effective sample size Neff rather than raw length N, and provide end-to-end generalization guarantees for Temporal Convolutional Networks (TCNs) on β-mixing sequences. Our analysis combines a blocking/coupling reduction that extracts B = (N/ N) approximately independent anchors with an architecture-aware Rademacher bound for 2,1-norm-controlled convolutional networks, yielding O(D p / B) complexity scaling in depth D and kernel size p. Empirically, we find that stronger temporal dependence can reduce generalization gaps when comparisons control for Neff - a conclusion that reverses under standard fixed-N evaluation, with observed rates of Neff-0.9 to Neff-1.2 substantially faster than the worst-case O(N-1/2) mixing-based prediction. Our results suggest that dependence-aware evaluation should become standard practice in temporal deep learning benchmarks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.