Improved Analysis of UCRL2 with Empirical Bernstein Inequality

Abstract

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with S states, A actions, ≤ S next states and diameter D, the regret of UCRL2B is bounded as O(D S A T).

0

Discussion (0)

Sign in to join the discussion.

Loading comments…