On Convergence of Gradient Expected Sarsa(λ)

Abstract

We study the convergence of Expected~Sarsa(λ) with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to Expected~Sarsa(λ) is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent Gradient~Expected~Sarsa(λ) (GES(λ)) algorithm. The theoretical analysis shows that our GES(λ) converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of GES(λ), such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our GES(λ).

0

Discussion (0)

Sign in to join the discussion.

Loading comments…