First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Chen-Yu Wei

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Abstract

We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of K arms to change over time without restriction. Assuming the d-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of T rounds is known to scale as O(Kd T). Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order O(Kd VT) in terms of the cumulative second moment of the learner's losses VT, and a closely related first-order bound of order O(Kd LT*) in terms of the cumulative loss of the best policy LT*. Since VT or LT* may be significantly smaller than T, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…