Prior Diffusiveness and Regret in the Linear-Gaussian Bandit
Abstract
We prove that Thompson sampling exhibits O(σ d T + d r Tr(0)) Bayesian regret in the linear-Gaussian bandit with a N(μ0, 0) prior distribution on the coefficients, where d is the dimension, T is the time horizon, r is the maximum 2 norm of the actions, and σ2 is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term d r Tr(0) decouples additively from the minimax (long run) regret σ d T. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.