Instance-dependent Stochastic Lipschitz bandit
Abstract
We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function f over a domain X ⊂ [0,1]d using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as Θ ( Td+1/d+2 ), or adaptive via the zooming dimension dz, yielding Θ ( Tdz+1/dz+2 ). However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of f. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of f over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension d>0, we obtain improved adaptive rates of order O ( Tdz+1 / (dz,d)+2 ) strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.