How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

Xiao Wang

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

Abstract

The key-value (KV) cache is the dominant memory bottleneck during Transformer inference, yet little is known theoretically about how aggressively it can be compressed before multi-step reasoning degrades. We study this through k-hop pointer chasing on n tokens under a shared KV cache of size s, attention dimension m, H heads, p-bit precision, and a locality-respecting cache controller (satisfied by all standard KV-compression methods). We give three results. (1) Product depth lower bound (conjectured). We conjecture that any such Transformer (n ≥ 4k, s ≤ n/4) requires depth L = ( k/s · 2 n/(Hmp) ), and isolate the sole remaining gap as a probabilistic step on the joint distribution of cache trace and pointer chain. Unconditionally, we prove a matching upper bound L = O((k, k/s s) · n/(mp)) via windowed pointer doubling, and a max-bound L = (( k/s , n/(Hmp))). Closing the conjecture amounts to upgrading max to product. (2) Bandwidth barrier. The product bound binds only when Hmp n. Any lower bound provable via per-window distinguishability counting -- including reachability, bandwidth, and combinations -- cannot exceed k/s once Hmp ≥ 2 n. Breaking this requires lifting unconditional communication-complexity bounds for pointer chasing to Cache-Transformer depth. (3) Adaptive vs oblivious error scaling. Under random cache over T = 2 k doubling stages, oblivious caches give [E] ≤ (s/(n-T))T + 2T3/n (exponential in T), while adaptive locality-respecting caches achieve [E] = s/n exactly, independent of T. The ((n/s)T-1) separation explains why heavy-hitter eviction empirically dominates random eviction for multi-hop reasoning.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…