Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-n test-time scaling with a reward model r(x,y) and speculative samples from a small auxiliary model πS(y x). We provably approximate both the optimal tilted policy πβ,B(y x) πB(y x)(β\,r(x,y)) of soft best-of-n under the base model πB, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K) and across different model families, our method achieves higher accuracy than standard soft best-of-n with πS and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-n with πB, while reducing end-to-end latency by up to 28\%. The code is available at https://github.com/j-geuter/GSI .

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…