A Tractable Algorithm For Finite-Horizon Continuous Reinforcement Learning

Abstract

We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order (T2/3) for any algorithm discretizes the state space, improving the previous regret bound of (T1/2) of Ortner and Ryabko contrl for the same problem. Next,under the assumption that the rewards and transitions are H\"older Continuous we show that the upper bound on the discretization error is const.Ln-αT. Finally,we give some simple experiments to validate our propositions.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…