Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

Chicheng Zhang

Efficient Online Bandit Multiclass Learning with O(T) Regret

Abstract

We present an efficient second-order algorithm with O(1ηT) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η=0) to squared hinge loss (η=1). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for T-regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or open the topic learn hub

Discussion (0)

Sign in to join the discussion.

Loading comments…