Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Tor Lattimore

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Abstract

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most O(d2.5 n (n)), where d is the dimension and n is the number of interactions. This improves on O(d9.5 n (n)7.5 by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…