Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Qing Zhao

doi:10.1109/JSTSP.2016.2592622

Risk-Averse Multi-Armed Bandit Problems under Mean-Variance Measure

Abstract

The multi-armed bandit problems have been studied mainly under the measure of expected total reward accrued over a horizon of length T. In this paper, we address the issue of risk in multi-armed bandit problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by ( T) and (T2/3), respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…