Transmuting prompts into weights
Abstract
A growing body of research has demonstrated that the behavior of large language models can be effectively controlled at inference time by directly modifying their internal states, either through vector additions to their activations or through updates to their weight matrices. These techniques, while powerful, are often guided by empirical heuristics, such as deriving ``steering vectors'' from the average activations of contrastive prompts. Building on the foundational work of Dherin et al. (2025), who discovered that a prompt's influence mathematically maps to token-dependent implicit weight updates and introduced the initial concept of a static thought patch for prompt compression, we elevate this framework into a robust algorithm for direct model editing. We derive a principled method for condensing this transient information into token-independent thought vectors and thought matrices. These constructs provide a theoretical explanation for existing vector-and-matrix-based model editing techniques and offer a direct, computationally-grounded method for transmuting textual input into reusable weight updates for complex architectures and new knowledge injection.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.