Gated MLPs as Symmetry-Broken Rank-1 Bilinear Attention

Abstract

We show that the conventional gated MLP can be viewed as a rank-1 approximation to a bilinear attention mechanism with two distinct factors corresponding to the query and the key. We further show that moving the nonlinearity onto one factor breaks the exchange symmetry between the two factors and, for non-homogeneous activations, the inverse-scaling symmetry as well. This perspective may help explain why gated MLPs are effective in practice and inform the design of future architectures.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…