Asymptotic Universal Alignment: A New Alignment Framework via Test-Time Scaling

Abstract

Aligning large language models (LLMs) to serve users with heterogeneous and potentially conflicting preferences is a central challenge for personalized and trustworthy AI. We formalize an ideal notion of universal alignment through test-time scaling: for each prompt, the model produces k 1 candidate responses and a user selects their preferred one. We introduce (k,f(k))-robust alignment, which requires the k-output model to have win rate f(k) against any other single-output model, and asymptotic universal alignment (U-alignment), which requires f(k) 1 as k∞. Our main result characterizes the optimal convergence rate: there exists a family of single-output policies whose k-sample product policies achieve U-alignment at rate f(k)=kk+1, and no method can achieve a faster rate in general. We show that popular post-training methods, including Nash learning from human feedback (NLHF), can fundamentally underutilize the benefits of test-time scaling. Even though NLHF is optimal for k=1, sampling from the resulting (often deterministic) policy cannot guarantee win rates above 12 except for an arbitrarily small slack. This stems from a lack of output diversity: existing alignment methods can collapse to a single majority-preferred response, making additional samples redundant. In contrast, our approach preserves output diversity and achieves the optimal test-time scaling rate. In particular, we propose a family of symmetric multi-player alignment games and prove that any symmetric Nash equilibrium policy of the (k+1)-player alignment game achieves the optimal (k,kk+1)-robust alignment. Finally, we provide theoretical convergence guarantees for self-play learning dynamics in these games and extend the framework to opponents that also generate multiple responses.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…