Closing the Theory-Practice Gap in Spiking Transformers via Effective Dimension

Abstract

Spiking transformers achieve competitive accuracy with conventional transformers while offering 38-57× energy efficiency on neuromorphic hardware, yet no theoretical framework guides their design. This paper establishes the first comprehensive expressivity theory for spiking self-attention. We prove that spiking attention with Leaky Integrate-and-Fire neurons is a universal approximator of continuous permutation-equivariant functions, providing explicit spike circuit constructions including a novel lateral inhibition network for softmax normalization with proven O(1/T) convergence. We derive tight spike-count lower bounds via rate-distortion theory: -approximation requires (Lf2 nd/2) spikes, with rigorous information-theoretic derivation. Our key insight is input-dependent bounds using measured effective dimensions (deff=47--89 for CIFAR/ImageNet), explaining why T=4 timesteps suffice despite worst-case T ≥ 10,000 predictions. We provide concrete design rules with calibrated constants (C=2.3, 95\% CI: [1.9, 2.7]). Experiments on Spikformer, QKFormer, and SpikingResformer across vision and language benchmarks validate predictions with R2=0.97 (p<0.001). Our framework provides the first principled foundation for neuromorphic transformer design.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…