Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

Paolo D'Alberto

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

Abstract

We analyse three KV cache quantization schemes under a fair bit budget: KV (scalar MSE baseline), KQV (WHT + MSE on K; WHT + MSE + QJL on V), and QKQV (WHT + MSE + QJL on both). Starting from the Beta distribution on the hypersphere, we trace how QJL on K inflates inner product variance by π/2, which softmax amplifies nonlinearly via Jensen's inequality, and we present statistical inference and information metrics to highlight practical differences. Three empirical findings emerge. (1)~At n=4 (the practically dominant budget), KQV wins on every measure -- KL divergence, geometric K error, and 6D distance -- across all distributions and ranks tested. (2)~The K--V asymmetry is unconditional: QKQV is consistently worse than KQV in KL divergence at every budget and distribution. (3)~A budget-dependent crossover exists: QKQV achieves better geometric K reconstruction at n ∈ \2,3,5\, KQV at n ∈ \4,6\, invariant to rank and tail weight -- an open rate-distortion problem. KL(pref \| pquant), K-only by construction, bridges K direction error to routing corruption and output collapse. We present a sufficient condition when the Jensen mechanism amplifies superlinearly through the softmax. At n ∈ \2,3,5\, QKQV wins geometrically because this assumption does not bind. At n=4, elevated K error and KL divergence for QKQV strongly suggest the Jensen mechanism is the operative cause of the crossover, providing a new perspective and explanation.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…