Improved Regret Bounds for Bandits with Expert Advice

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order K T (N/K) for the worst-case regret, where K is the number of actions, N>K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of K T ( N) / ( K). For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…