Exploring TD error as a heuristic for σ selection in Q(σ, λ)

Abstract

In the landscape of TD algorithms, the Q(σ, λ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. σ ∈ [0, 1] indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…