Unveiling the Reasoning Process of Large Language Models

Abstract

Large language models often reason beyond surface tokens, but the internal stage at which token-level information becomes abstract relational structure remains unclear. We investigate this question by analyzing how attention heads and layers transform information during autoregressive reasoning. Across mathematical and symbolic reasoning tasks, we observe a consistent layer-wise division of labor: outer layers mainly preserve and route input-related features, whereas middle layers reorganize them into more transferable rule-level representations. This interpretation is supported by representation geometry: middle-layer states occupy lower-dimensional manifolds and show stronger alignment across disjoint vocabularies that instantiate the same symbolic rules. It is further supported by causal interventions: removing middle-layer components identified by our interaction-based criterion produces substantially larger downstream changes and accuracy drops than removing components from other regions or at random. Together, these results suggest that abstract reasoning is not uniformly distributed across transformer layers, but is preferentially formed in a middle-layer computation stage that converts token-level information into reusable relational structure.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…