On the Existence of Universal Simulators of Attention

Abstract

Previous work on the learnability of transformers \ focused primarily on examining their ability to approximate specific algorithmic patterns through training \ has largely been data-driven, offering only probabilistic guarantees rather than deterministic solutions. Expressivity, on the contrary, has been devised to address the problems computable by such architecture theoretically. These results proved the Turing-completeness of transformers, investigated bounds focused on circuit complexity, and formal logic. Being at the crossroad between learnability and expressivity, the question remains: can a transformer, as a computational model, simulate an arbitrary attention mechanism, or in particular, the underlying operations? In this study, we investigate the transformer encoder's ability to simulate a vanilla attention mechanism. By constructing a universal simulator U composed of transformer encoders, we present algorithmic solutions to replicate attention outputs and the underlying elementary matrix and activation operations via RASP, a formal framework for transformer computation. We show the existence of an algorithmically achievable, data-agnostic solution, previously known to be approximated only by learning.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…