Stochastic continuum armed bandit problem of few linear parameters in high dimensions
Abstract
We consider a stochastic continuum armed bandit problem where the arms are indexed by the 2 ball Bd(1+) of radius 1+ in Rd. The reward functions r :Bd(1+) → R are considered to intrinsically depend on k d unknown linear parameters so that r(x) = g(A x) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery literature and derive an efficient randomized algorithm which achieves a regret bound of O(C(k,d) n1+k2+k ( n)12+k) with high probability. Here C(k,d) is at most polynomial in d and k and n is the number of rounds or the sampling budget which is assumed to be known beforehand.
0