Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Abstract

The dense token-to-token interaction pattern of standard dot-product attention remains a central bottleneck in scaling Transformer architectures to long contexts. We introduce Gaussian Mixture Attention (GMA), a probabilistic attention-style sequence mixer that replaces explicit pairwise query--key comparison with routing through K learned Gaussian mixture components. Queries and keys are mapped to posterior responsibility vectors over a shared latent routing space; their overlap defines an implicit responsibility-space affinity, while values are written into and read from a K-slot latent memory. By exploiting the associativity of matrix multiplication, GMA avoids materializing the induced N× N affinity matrix and instead uses two responsibility matrices whose dominant activation storage scales as O(NK) rather than O(N2) for fixed K. We formulate bidirectional and causal variants of GMA, provide an end-to-end differentiable parameterization of the Gaussian mixture components, and analyze its responsibility-modulated gradient structure, constrained non-negative low-rank affinity interpretation, and local routing stability. Empirically, GMA exhibits the intended fixed-K linear memory scaling and is competitive with attention-style baselines on long-context classification, while causal GMA improves over tested linear/random-feature attention variants on WikiText-103 but remains behind optimized causal SDPA and Mamba in the current implementation. Analysis of learned responsibilities further shows broad component usage and moderate alignment with surface-form token categories, supporting GMA as a probabilistic, interpretable, fixed-K linear-time attention-style alternative rather than a universal replacement for optimized softmax attention or state-space models.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…