Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits

Abstract

We study the stochastic linear bandit problem with multiple arms over T rounds, where the covariate dimension d may exceed T, but each arm-specific parameter vector is s-sparse. We begin by analyzing the sequential estimation problem in the single-arm setting, focusing on cumulative mean-squared error. We show that Lasso estimators are provably suboptimal in the sequential setting, exhibiting suboptimal dependence on d and T, whereas thresholded Lasso estimators -- obtained by applying least squares to the support selected by thresholding an initial Lasso estimator -- achieve the minimax rate. Building on these insights, we consider the full linear contextual bandit problem and propose a three-stage arm selection algorithm that uses thresholded Lasso as the main estimation method. We derive an upper bound on the cumulative regret of order s( s)( d + T), and establish a matching lower bound up to a s factor, thereby characterizing the minimax regret rate up to a logarithmic term in s. Moreover, when a short initial period is excluded from the regret, the proposed algorithm achieves exact minimax optimality.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…