Improved Regret Bounds for Bandits with Expert Advice

Julia Olkhovskaya

Improved Regret Bounds for Bandits with Expert Advice

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order K T (N/K) for the worst-case regret, where K is the number of actions, N>K the number of experts, and T the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of K T ( N) / ( K). For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…