Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Abstract
In this paper, we study the problem of stochastic linear bandits with finite action sets. Most of existing work assume the payoffs are bounded or sub-Gaussian, which may be violated in some scenarios such as financial markets. To settle this issue, we analyze the linear bandits with heavy-tailed payoffs, where the payoffs admit finite 1+ε moments for some ε∈(0,1]. Through median of means and dynamic truncation, we propose two novel algorithms which enjoy a sublinear regret bound of O(d12T11+ε), where d is the dimension of contextual information and T is the time horizon. Meanwhile, we provide an (dε1+εT11+ε) lower bound, which implies our upper bound matches the lower bound up to polylogarithmic factors in the order of d and T when ε=1. Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees.