CALYREX: Cross-Attention LaYeR EXtended Transformers for System Prompt Anchoring

Abstract

Modern large language models (LLMs) rely on system prompts to establish behavioral constraints and safety rules. Standard causal self-attention treats privileged instructions and untrusted user content with equal structural priority -- a mismatch that leaves models vulnerable to prompt injection and instruction erosion over extended contexts. We propose CALYREX (Cross-Attention LaYeR EXtended transformers), which utilizes cross-attention between input and system prompt to structurally isolate and anchor the rule. A placement ablation on a 1.5B backbone identifies insertion at the final eighth of layers as optimal, confirmed by mechanistic activation analysis showing behavioral constraints are naturally concentrated there. At 8B scale, controlling for training data, backbone, and parameter budget, CALYREX yields +7.4\% on instruction-following (IFEval) and +16.3\% on multi-turn instruction adherence, while reducing many-shot jailbreaking attack success rate by 13\%. This advantage appears to widen with model scale, consistent with larger models more effectively utilizing the dedicated routing pathway.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…