LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
Abstract
Backpropagation is inherently sequential across depth, creating an O(K)-deep dependency chain that bottlenecks parallel training. While parallel-scan formulations theoretically reduce this depth to O( K), they are computationally prohibitive for modern architectures due to the O(d3) cost of composing full-rank d× d Jacobians over the entire hidden state. We introduce Latent Bounded Interfaces (LBI), an algorithmic formulation that makes scan-based backpropagation tractable by restricting inter-region communication to a low-dimensional latent interface, mk ∈ Rr, where r d. This reduces the adjoint recursion to a suffix scan over r × r Jacobians, cutting per-combine cost from O(d3) to O(r3) while preserving exact gradients under the bounded-interface model. We demonstrate that LBI maintains model quality across four architectures (Mamba-2, Mamba-3, Transformer, and a Mamba--Transformer hybrid) at 47--61M block parameters. Interfaces of dimension r=16 suffice to preserve training quality within 0.16--0.35 cross entropy of dense baselines. The resulting framework provides an algorithmic foundation for region-parallel training, reducing cross-device backward communication to a single scan over K fixed-size matrices, of approximately 56 KB for our experimental configurations.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.